19

Which characters must be avoided to make sure PSQLException: ERROR: syntax error in tsquery will not occur? The documentation does not say anything about how to escape the search string: http://www.postgresql.org/docs/8.3/static/datatype-textsearch.html

ideaboxer
  • 3,863
  • 8
  • 43
  • 62

1 Answers1

35

Use quotes around your terms if you want them as phrases/verbatim or they contain characters used in the syntax:

select to_tsquery('"hello there" | hi');

Bear in mind that you shouldn't really have crazy characters in your terms, since they are not going to match anything in the tsvector.

The (non-token) characters recognized by the tsquery parser are: \0 (null), (, ), (whitespace), |, &, :, * and !. But how you tokenize your query should be based on how you have setup your dictionary. There are a great many other characters that you will likely not want in your query, not because they will cause a syntax error but because it means you are not tokenizing your query correctly.

Use the plainto_tsquery version if it's a simple AND query and you don't want to deal with creating the query manually.

kevhender
  • 4,285
  • 1
  • 13
  • 16
Chris Farmiloe
  • 13,935
  • 5
  • 48
  • 57
  • 2
    Please define "crazy characters", then my question is answered :-) – ideaboxer Apr 15 '13 at 17:23
  • (my primary goal is to avoid any occurence of PSQLException just because somebody entered crazy characters) – ideaboxer Apr 15 '13 at 17:26
  • 3
    Any character that does not appear in your dictionary. For an english dictionary I'd say: `[^A-Za-z0-9] == crazy`. – Chris Farmiloe Apr 15 '13 at 17:30
  • Thanks. Maybe there is a list of characters to avoid (if the search term is outside double quotes)? So that I can remove no more than the problematic characters? For URL encoding I know exactly which characters to avoid, they are on page 12 of RFC 3986: http://tools.ietf.org/html/rfc3986#page-12 Where can I find such a definition for tsquery? – ideaboxer Apr 15 '13 at 17:36
  • Try [the source](http://doxygen.postgresql.org/tsquery_8c.html#a7dedd4646a8bc9adb2a12b72b0987fa7) ... I've added the chars I could see – Chris Farmiloe Apr 15 '13 at 18:00
  • 1
    You may also want to handle backslash and comma and single-quote, which have special meanings. – Dwayne Towell Mar 26 '14 at 15:58
  • 7
    What am I missing here? `select to_tsquery('"hello there" | hi');` results in `ERROR: syntax error in tsquery`. Was this an example of what *doesn't* work? – Crescent Fresh Jul 18 '21 at 15:56