1

I am trying to perform a rethinkdb match query with an escaped unicode user provided search param:

import re
from rethinkdb import RethinkDB

r = RethinkDB()

search_value = u"\u05e5"  # provided by user via flask
search_value_escaped = re.escape(search_value)  # results in u'\\\u05e5' ->
    # when encoded with "utf-8" gives "\ץ" as expected.

conn = rethinkdb.connect(...)

results_cursor_a = r.db(...).table(...).order_by(index="id").filter(
    lambda doc: doc.coerce_to("string").match(search_value)
).run(conn)  # search_value works fine

results_cursor_b = r.db(...).table(...).order_by(index="id").filter(
    lambda doc: doc.coerce_to("string").match(search_value_escaped)
).run(conn)  # search_value_escaped spits an error

The error for search_value_escaped is the following:

ReqlQueryLogicError: Error in regexp `\ץ` (portion `\ץ`): invalid escape sequence: \ץ in:
r.db(...).table(...).order_by(index="id").filter(lambda var_1: var_1.coerce_to('string').match(u'\\\u05e5m'))
                                                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^         

I tried encoding with "utf-8" before/after re.escape() but same results with different errors. What am I messing? Is it something in my code or some kind of a bug?

EDIT: .coerce_to('string') converts the document to "utf-8" encoded string. RethinkDB also converts the query to "utf-8" and then it matches them hence the first query works even though it looks like a unicde match inside a string.

snakecharmerb
  • 47,570
  • 11
  • 100
  • 153
Bogdan
  • 105
  • 8
  • It does look like RethinkDB doesn't accept unicode escaped characters in it's query (which usually python ignores) so writing my own escape function will be the solution unless someone sheds some light on it. – Bogdan Mar 30 '19 at 17:36
  • You could leave a comment on [this question](https://stackoverflow.com/q/30548165/5320906) and ask if they ever solved the problem (the user is still active on SO). – snakecharmerb Mar 30 '19 at 17:40

1 Answers1

1

From what it looks like RethinkDB rejects escaped unicode characters so I wrote a simple workaround with a custom escape without implementing my own logic of replacing characters (in fear that I must miss one and create a security issue).

import re

def no_unicode_escape(u):
    escaped_list = []

    for i in u:
        if ord(i) < 128:
            escaped_list.append(re.escape(i))
        else:
            escaped_list.append(i)

    rv = "".join(escaped_list)
    return rv

or a one-liner:

import re

def no_unicode_escape(u):
    return "".join(re.escape(i) if ord(i) < 128 else i for i in u)

Which yields the required result of escaping "dangerous" characters and works with RethinkDB as I wanted.

Bogdan
  • 105
  • 8