{BUGS} Search engine bugs

David A. Bayly dbayly at udena.ch
Fri Apr 18 13:09:02 PDT 2003


People were complaining that words on Manila site pages indexed by 
the search engine  did not appear in the index.

To date I have found 2 reasons for this

1) There is an undocumented cache in the search engine that holds all 
search results  . It uses a value in 
config.mainResponder.search.prefs.hoursToCacheSearchResults to 
determine the life time of a cached entry. The default value is 24 
hours, and since there is no user interface to change this, for most 
servers this will be the life time for all searches. As far as I can 
find there is no other way to clear this cache - see 
mainResponder.search.server.getResults

This is not an optimal strategy for fast changing sites.


2) When a search string includes high ascii characters they are 
deleted because  string.dropNonAlphas is applied to every word  in 
the search string.  The indexing process also discards all high ascii 
characters, so  the search engine doesn't work for most European 
languages besides English.

Also as has been mentioned by others in earlier times , the practice 
of eliding trailing s and S characters makes no sense in languages 
other than English.

Finally the stop words need to be language dependent
-- 

- David Bayly.       Programmer and digest reader.     dbayly at udena dot ch
   		Digest Readers do it once a day.



More information about the Frontier-Server mailing list