1

How can I include an apostrophe (') in a Whoosh query? For example (that's):

tws_fileName.tws_query_index( 'that's' )

because the query mentioned above does not work, neither does the following

tws_fileName.tws_query_index( "that's" )
A Alw
  • 41
  • 1
  • 6

1 Answers1

1

You can't do this by default because all punctuation is stripped from the index by the RegexTokenizer in the StandardAnalyzer."that's all, folks!" is indexed as [that,s,all,folks]

You could circumvent this either by:

  1. Using a KEYWORD instead of TEXT field, because it doesn't use the RegexTokenizer, but you wouldn't be able to do phrase searches on a keyword field.

  2. Using a TEXT field with the StandardAnalyzer and a custom RegexTokenizer regular expression. In the example below, the modified regular expression accepts apostrophes as valid parts of a token.

    from whoosh import fields, analysis

    myanalyzer = analysis.StandardAnalyzer(expression=r'[\w\']+(\.?\w+)*')
    schema = fields.Schema(myfield=fields.TEXT(analyzer=myanalyzer))

Any apostrophes in myfield will now be preserved with the token: [that's, all, folks]. If you submit your query as "that's" or 'that\'s', you will get a match. However, a search for "that" will no longer find this document because there is no such token.

Steven
  • 1,733
  • 2
  • 16
  • 30