Article 3775 of comp.infosystems.gopher:
Xref: feenix.metronet.com comp.infosystems.gopher:3775
Path: feenix.metronet.com!news.utdallas.edu!hermes.chpc.utexas.edu!cs.utexas.edu!math.ohio-state.edu!howland.reston.ans.net!ux1.cso.uiuc.edu!not-for-mail
From: grady@ux1.cso.uiuc.edu (Mike Grady)
Newsgroups: comp.infosystems.gopher
Subject: Perl script for building wais index
Date: 13 Jul 1993 15:50:45 -0500
Organization: University of Illinois at Urbana
Lines: 111
Message-ID: <21v77j$1qd@ux1.cso.uiuc.edu>
NNTP-Posting-Host: ux1.cso.uiuc.edu
Summary: Perl script to descend file hierarchy and buidl wais index
Keywords: perl wais index

I use the following perl script to descend a set of directories and build a
wais index. It is an alternate to using the find command to feed waisindex
the filenames you want to be indexed. It also will create the appropriate
links file, if it doesn't already exist. I find it easier to modify the
exclusion list in perl than with a "straight find". A mod I might add is to
look up the directories "name" in the .cap file so the title put into the
.linkindex file is not generic "Search of Directory", although this is easy
enough to edit and change. 
---------------
#!/usr/local/bin/perl 
#	Build a waisindex for Gopher; can optionally supply two arguments:
#	buildwais $dir $indexname
#		$dir -- directory for which to build index (relative
#			to where we currently are).
#			(defaults to . -- i.e. where we are now)
#			This is also where the index will reside, in a
#			directory named ".waisindex".
#		$indexname -- "Name" to give to index if you don't
#			want it built with default of "index"
#			(the one value you would use is "indexg" to
#			build a global index that uses DOCN field).
#			DOCN and indexg are a local U. of Ill. construct
#			that required modifying several Gopher programs and
#			allows for a "tag" to be added at end of title
#			retrieved from wais index to identify document it
#			came from.
# Creates a .indexlink file for wais index if it doesn't already exist.

require "find.pl";  	# see list of excluded names at end

umask ( 002 );		# permissions to create files, directory with
$newdirperm = 0775;
$GOPHERROOT = '/usr/spool/gopher/gd';	  # where your Gopher data tree begins 
#$GOPHERROOT = '/usr/spool/gopher/test';
$PROG = '/usr/staff/grady/bin/waisindex'; # where your waisindex program is

$dir = shift(@ARGV);
if ($dir eq "") {
	$dir = '.';
	$indexdir = '.waisindex';
	$linkfile = '.indexlink';
} else {
	$indexdir = $dir . '/.waisindex';
	$linkfile = $dir . '/.indexlink';
}

$indexname = shift(@ARGV);
if ($indexname eq "") {
	$index = $indexdir . '/index';
} else {
	$index = $indexdir . '/' . $indexname;
}

unless (-e $linkfile) {
	$curdir = `pwd`;
	chop ($curdir);
	die "Not in Gopher tree!\n" unless (index($curdir,$GOPHERROOT)==0);
	$pos = length($GOPHERROOT);
	if ($GOPHERROOT eq $curdir) {$linkpath = '7';}
		else {$linkpath = '7' . substr($curdir,$pos);}
	open (LINK, ">$linkfile") || die "Can't open $linkfile: $!\n";
	print LINK "Name=Search of Directory\n";
	print LINK "Numb=2\n";
	print LINK "Type=7\n";
	print LINK "Path=$linkpath/$index\n";
	print LINK "Host=+\n";
	print LINK "Port=+\n";
	close (LINK);
}

unless (-e $indexdir) {
	mkdir ($indexdir, $newdirperm) 
		|| die "Can't create index directory: $!\n";
}

#open (TOWAIS,"| cat");  # to just see file names that will be indexed
			 # rather than actually indexing them, use this
			 # instead of following.
open (TOWAIS,"| $PROG -d $index -stdin");
# Traverse desired filesystems

&find($dir);

close(TOWAIS);
exit;

# this establishes the files to be indexed; find traverses down the
# hierarchy, and we ignore files/directories which begin with a dot,
# core,adm,bin,dev,etc,usr, or end in .bak or .lock.
sub wanted {
    (
	(($dev,$ino,$mode,$nlink,$uid,$gid) = lstat($_)) &&
	! /^\..?.*$/ &&
	! /^core$/ &&
	! /^adm$/ &&
	! /^bin$/ &&
	! /^dev$/ &&
	! /^etc$/ &&
	! /^usr$/ &&
	! /\.bak$/ &&
	! /\.lock$/ &&
	print(TOWAIS "$name\n")
    )
    ||
    ($prune = 1);
}
-- 
Michael Grady,  Univ. of Illinois Computing & Communications Services Office
Rm. 1503 DCL, 1304 W. Springfield Ave., Urbana, IL 61801
Internet: mike-grady@uiuc.edu   phone: (217) 244-1253    fax: (217) 244-7089
Disclaimer: The opinions of CCSO may differ from mine.


