How can I include an apostrophe (') in a Whoosh query? For example (that's):
tws_fileName.tws_query_index( 'that's' )
because the query mentioned above does not work, neither does the following
tws_fileName.tws_query_index( "that's" )
How can I include an apostrophe (') in a Whoosh query? For example (that's):
tws_fileName.tws_query_index( 'that's' )
because the query mentioned above does not work, neither does the following
tws_fileName.tws_query_index( "that's" )
You can't do this by default because all punctuation is stripped from the index by the RegexTokenizer
in the StandardAnalyzer
."that's all, folks!"
is indexed as [that,s,all,folks]
You could circumvent this either by:
Using a KEYWORD
instead of TEXT
field, because it doesn't use the RegexTokenizer
, but you wouldn't be able to do phrase searches on a keyword field.
Using a TEXT
field with the StandardAnalyzer
and a custom RegexTokenizer
regular expression. In the example below, the modified regular expression accepts apostrophes as valid parts of a token.
from whoosh import fields, analysis
myanalyzer = analysis.StandardAnalyzer(expression=r'[\w\']+(\.?\w+)*')
schema = fields.Schema(myfield=fields.TEXT(analyzer=myanalyzer))
Any apostrophes in myfield
will now be preserved with the token: [that's, all, folks]
. If you submit your query as "that's"
or 'that\'s'
, you will get a match. However, a search for "that"
will no longer find this document because there is no such token.