2

I have indexed all my documents with a schema like this:

ID = ID(stored=True)
Body = TEXT(analyzer=StemmingAnalyzer(), stored=False,field_boost=4.0)
Name = TEXT(stored=True, field_boost=5.0)
Brand= TEXT(StemmingAnalyzer(),stored=False, field_boost=4.0)
...

My search module looks like this:

qp = MultifieldParser(["Name", "Body", "Brand", 
"Familia","Superpadre","Tags","ID"], schema=ix.schema)

But when I search for iphone 6, it is querying like this:

<Top 20 Results for Or([Term('Name', u'iphone'), Term('Body',
 u'iphon'), Term('Brand', u'iphon'), Term('Familia', u'iphon'), 
Term('Superpadre', u'iphon'), And([Term('Tags', u'iphone'),  
Term('Tags', u'6')]), Term('ID', u'iphon')]) runtime=0.0327291488647>

It is only searching for the digit 6 in the TAGS, but not in the name, brand, etc.

Could you please help me to search it also in the other fields?

Thank you all in advance.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Claudia Guirao
  • 335
  • 1
  • 3
  • 10

2 Answers2

2

All words with single-character is considered as stop words in Whoosh by default and ignored. This means all letters and digits are ignored.

stop words are words which are filtered out before or after processing of natural language data (text). (ref)

You can check that StopFilter has a minsize = 2 by default added to pre-defined set.

class whoosh.analysis.StopFilter(
        stoplist=frozenset(['and', 'is', 'it', 'an', 'as', 'at', 'have', 'in', 'yet', 'if', 'from', 'for', 'when', 'by', 'to', 'you', 'be', 'we', 'that', 'may', 'not', 'with', 'tbd', 'a', 'on', 'your', 'this', 'of', 'us', 'will', 'can', 'the', 'or', 'are']),
        minsize=2,
        maxsize=None,
        renumber=True,
        lang=None
        )

So You can resolve this issue by redefining your schema and removing the StopFilter or using it with minsize = 1:

from whoosh.analysis import StemmingAnalyzer
schema = Schema(content=TEXT(analyzer=StemmingAnalyzer(stoplist=None)))

or

schema = Schema(content=TEXT(analyzer=StemmingAnalyzer(minsize=1)))
Assem
  • 11,574
  • 5
  • 59
  • 97
0

Solved with this parameter in my schema

StemmingAnalyzer(minsize=1)
Claudia Guirao
  • 335
  • 1
  • 3
  • 10