{BUGS} Search engine bugs
David A. Bayly
dbayly at udena.ch
Fri Apr 18 13:09:02 PDT 2003
People were complaining that words on Manila site pages indexed by
the search engine did not appear in the index.
To date I have found 2 reasons for this
1) There is an undocumented cache in the search engine that holds all
search results . It uses a value in
config.mainResponder.search.prefs.hoursToCacheSearchResults to
determine the life time of a cached entry. The default value is 24
hours, and since there is no user interface to change this, for most
servers this will be the life time for all searches. As far as I can
find there is no other way to clear this cache - see
mainResponder.search.server.getResults
This is not an optimal strategy for fast changing sites.
2) When a search string includes high ascii characters they are
deleted because string.dropNonAlphas is applied to every word in
the search string. The indexing process also discards all high ascii
characters, so the search engine doesn't work for most European
languages besides English.
Also as has been mentioned by others in earlier times , the practice
of eliding trailing s and S characters makes no sense in languages
other than English.
Finally the stop words need to be language dependent
--
- David Bayly. Programmer and digest reader. dbayly at udena dot ch
Digest Readers do it once a day.
More information about the Frontier-Server
mailing list