whoosh search query with converting '-' into AND

Question

I'm trying to use whoosh to do text searches.

When I search for a string containing - (ex.: 'IGF-1R'), it ends up searching for 'IGF' AND '1R', hence not treating it as a single string.

Any idea why?

Here is the code I'm using:

class MyFuzzyTerm(FuzzyTerm):
     def __init__(self, fieldname, text, boost=1.0, maxdist=1, prefixlength=2, constantscore=True):
          super(MyFuzzyTerm, self).__init__(fieldname, text, boost, maxdist, prefixlength, constantscore)

with ix.searcher() as searcher:
    qp = QueryParser("gene", schema=ix.schema, termclass=MyFuzzyTerm)
    q = qp.parse('IGF-1R')

q returns:

And([MyFuzzyTerm('gene', 'igf', boost=1.000000, maxdist=1, prefixlength=2), MyFuzzyTerm('gene', '1r', boost=1.000000, maxdist=1, prefixlength=2)])

I'd like it to be:

MyFuzzyTerm('gene', 'igf-1r', boost=1.000000, maxdist=1, prefixlength=2)

score 0 · Accepted Answer · edited Dec 27 '16 at 16:03

0

Separating text into words is the job of tokenizer, I usually use the whoosh.analysis.SpaceSeparatedTokenizer() but for your case the tokenizer is separating based on space and dash.
So I bet you are using the whoosh.analysis.CharsetTokenizer(charmap) with (space, dash) within charmap or the whoosh.analysis.RegexTokenizer(expression=<_sre.SRE_Pattern object>, gaps=False).

edited Dec 27 '16 at 16:03

ismnoiet

4,129
24
30

answered Dec 27 '16 at 15:28

Assem

11,574
5
59
97

Thanks for your help! I'll give it a try. – yoann Jan 28 '17 at 21:23

whoosh search query with converting '-' into AND

1 Answers1