3

I'm trying to index some old documents for searching -- 16th, 17th, 18th century.

Modern stemmers don't seem to handle the antiquated word endings: worketh, liveth, walketh.

Are there stemmers that specialize in the English from the time of Shakespeare and the King James Bible? I'm currently using solr.PorterStemFilterFactory.

Eric Wilson
  • 57,719
  • 77
  • 200
  • 270
  • Do you know any online available dictionary files that work with that old English? – cheffe Jun 26 '15 at 04:49
  • 1
    [The DOA](https://en.wikipedia.org/wiki/Dictionary_of_Old_English) reads good, but is work in progress (as of 2015). – cheffe Jun 26 '15 at 04:52
  • @cheffe I do not, but suppose I did. Is there a way to make a stemmer out of a dictionary file with Solr/Lucene? – Eric Wilson Jun 26 '15 at 10:34
  • Yes, for Solr the [HunspellStemFilterFactory](https://cwiki.apache.org/confluence/display/solr/Filter+Descriptions#FilterDescriptions-HunspellStemFilter) or if you are using Lucene the [HunspellStemFilter](http://lucene.apache.org/core/5_2_0/analyzers-common/org/apache/lucene/analysis/hunspell/HunspellStemFilter.html) itself. – cheffe Jun 26 '15 at 10:50

1 Answers1

1

It looks like the rule changes are minimal for that.

So, it might be possible to copy/modify the PorterStemmer class and related Factories/Filters.

Or it might be possible to add those specific rules as Regular expression filter before Porter.

Alexandre Rafalovitch
  • 9,709
  • 1
  • 24
  • 27