1

I am trying to parse a query which has text plus number.

Example: Apple iphone 6 results in:

  Results for And([Term('title', u'apple'), Term('title', u'iphone')])

while Apple iphone 62 results in:

  Results for And([Term('title', u'apple'), Term('title', u'iphone'), Term('title', u'62')])

Why isn't it accepting single digit number?

blackmamba
  • 1,952
  • 11
  • 34
  • 59
  • Just out of curiosity, I checked out whoosh via mercurial and can't reproduce: `p = whoosh.qparser.QueryParser("field", None)`, `p.parse('htc one 8')` leads to `And([Term('field', 'htc'), Term('field', 'one'), Term('field', '8')])` – Jasper Jan 18 '15 at 21:54
  • It is based on what Analyzer you chose for the `field`. Single char are considered as stopwords in the standard analyzer, check my answer – Assem May 29 '15 at 22:02
  • @blackmamba please check my answer and consider to upvote/mark it as accepted – Assem Dec 03 '15 at 01:05

1 Answers1

1

All words with single-character is considered as stop words in Whoosh by default and ignored. This means all letters and digits are ignored.

stop words are words which are filtered out before or after processing of natural language data (text). (ref)

You can check that StopFilter has a minsize = 2 by default added to pre-defined set.

class whoosh.analysis.StopFilter(
        stoplist=frozenset(['and', 'is', 'it', 'an', 'as', 'at', 'have', 'in', 'yet', 'if', 'from', 'for', 'when', 'by', 'to', 'you', 'be', 'we', 'that', 'may', 'not', 'with', 'tbd', 'a', 'on', 'your', 'this', 'of', 'us', 'will', 'can', 'the', 'or', 'are']),
        minsize=2,
        maxsize=None,
        renumber=True,
        lang=None
        )

So You can resolve this issue by redefining your schema and removing the StopFilter or using it with minsize = 1:

from whoosh.analysis import StandardAnalyzer
schema = Schema(content=TEXT(analyzer=StandardAnalyzer(stoplist=None)))
Assem
  • 11,574
  • 5
  • 59
  • 97