2

if anyone knows a simple answer to this, I don't have to wade through creating an extra index with escaped strings and crying my eyes out while littering my pretty code.

Basically, the Lucene search we have running cannot handle any non-letter characters. Space, percent signs, dots, dashes, slashes, you name it. This is higly infuriating, because I cannot make any search on items containing these characters, no matter wherever I escape them or not.

I have two options: Kill these characters in a separate index and strip them from the names I'm searching or stop goddamn searching.

Igor
  • 33,276
  • 14
  • 79
  • 112
John
  • 261
  • 1
  • 3
  • 16

3 Answers3

3

You can escape special characters using '/'. Lucene treats followings the following as special characters and you will have to escape those characters to make it work.

+ - && || ! ( ) { } [ ] ^ " ~ * ? : \ 

If you want to search "2+3", query should be "2/+3"

Clement Herreman
  • 10,274
  • 4
  • 35
  • 57
Jugs
  • 374
  • 1
  • 5
  • 10
  • 1
    Thank you. This is the correct answer (I foolishly tried to escape with backslash all the time). However we have long moved off from the Zend-managed Lucene index as it was a horrible god damn mess. Next time we'll break out a Solr instance and bypass all this hell. – John Oct 20 '09 at 10:24
  • I just have a question! Don't I need to escape `$` sign as it is a special character that marks the end of a string? – Ankit Jun 28 '12 at 20:38
3

Use QueryParser.escape(String s) to escape the query string.

Ralph
  • 118,862
  • 56
  • 287
  • 383
  • This is not the solution if you're performing a `Boolean` query. Because a query like `+web +mail` gets escaped and it searches for `web` or `mail` instead for both keywords. Anyone knows the right escaping for `Boolean` queries? – TiMESPLiNTER Aug 11 '14 at 06:04
1

According to http://lucene.apache.org/core/old_versioned_docs/versions/2_9_1/queryparsersyntax.html#-

The escape character is slash-backward, not -forward: .

And to answer Ankit, $ doesn't seem to have to be escaped since it's not a special character.

Escaping the dash as suggested by Ralph doesn't make a difference for me (Zend Lucene). You'd think that when a word 'abc-def' is indexed and you search for 'abc-def' you'll somehow find that word, regardless of whether the dash is ignored at the indexing step or not. Same input should have same result. The word seems to be indexed as two separate tokens 'abc' and 'def'. Yet searching for 'abc-def' gives no results when 'abc def' does.

Alex Haan
  • 11
  • 1