7

I want to use a regex like (^|\s)1001(\s|$) in a Firebird similar to expression:

Examples:

  • abc 1001 abc - true
  • abc 121001 abc - false
  • 1001 abc - true
  • 121001 - false
  • abc 1001 - true

I try'd to convert it to a regex in Firebird:

Where COLUMN similar to (^|[:WHITESPACE:])abc 1001 abc($|[:WHITESPACE:]), but ^ (start of line) and $ (end of line) is not working and the query end with:

Invalid SIMILAR TO pattern Exception.

I can not find anything about start and end of line in the Firebird Doc's at https://firebirdsql.org/refdocs/langrefupd25-similar-to.html

Mark Rotteveel
  • 100,966
  • 191
  • 140
  • 197
smoothie
  • 93
  • 1
  • 6

2 Answers2

3

From the Firebird 2.5 Language Reference, SIMILAR TO documentation:

SIMILAR TO matches a string against an SQL regular expression pattern. Unlike in some other languages, the pattern must match the entire string in order to succeed—matching a substring is not enough.

In other words, the regular expression is multi-line and - given the linked documentation - provides no start/end anchors as those are already implied (but then whole string, not per line), as partial matches are not supported.

The regular expression implementation in Firebird conforms to the SQL standard, which also doesn't define start / end anchors.

Given your requirements, you probably need something like:

'(abc 1001( %)?)|((% )?1001 abc)'

Where ( %)? means optionally match a space and zero or more of any character. Given the whole string must match, that means it finds either a space or the end of the string, and similar for (% )?.

You may need to add additional terms if you also need to find this in the middle of a string (but none of your examples suggested that).

Or, a direct equivalent of (^|\s)1001(\s|$):

'(%[[:WHITESPACE:]])?1001([[:WHITESPACE:]]%)?'

An earlier version of this answer used (% |) instead of (% )?, but given empty terms are not documented nor part of the standard, that is possibly an implementation bug or at best an undocumented feature. Use that at your own risk.

Mark Rotteveel
  • 100,966
  • 191
  • 140
  • 197
  • I needed to scan for `CA` as a whole word in a few columns of every row. I needed to check if the two letters were not preceded or followed by letters. I used `SIMILAR TO '(%[^[:ALPHA:]])?CA([^[:ALPHA:]]%)?'` – mickmackusa Mar 01 '23 at 22:22
2

Now, the (^|\s)1001(\s|$) would not work since it means you want to get partial matches. It is not possible with SIMILAR TO:

SIMILAR TO matches a string against an SQL regular expression pattern. Unlike in some other languages, the pattern must match the entire string in order to succeed—matching a substring is not enough.

Then, (^|\s) means either start of string or whitespace. That means, you should check if the string has any chars and then a whitespace or just 1001 can appear at the start of the string. ($|\s) means either end of string or whitespace. That means, you need to account for 3 cases:

  • Any chars, whitespace, 1001, whitespace and any chars
  • 1001, whitesapce, any chars
  • Any chars, whitespace, 1001

You need to use

WHERE col SIMILAR TO '%[[:WHITESPACE:]]1001[[:WHITESPACE:]]%' or col SIMILAR TO '1001[[:WHITESPACE:]]%' or col SIMILAR TO '%[[:WHITESPACE:]]1001'
Mark Rotteveel
  • 100,966
  • 191
  • 140
  • 197
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • They are regular expressions and you can do more than just use wildcards, just not the feature rich regular expressions a lot of language provides. – Mark Rotteveel Aug 31 '18 at 17:19
  • @MarkRotteveel These are not regular expressions. MS team also calls MS Office wildcard patterns regexps, but they are not. Also, Lua has patterns, still, they are not regexps. – Wiktor Stribiżew Aug 31 '18 at 17:20
  • What does it make not a regular expression, this is the ISO-9075:2016-1 (SQL:2016) regular expression syntax; which in the case of `similar to` requires full matches, not substring matches. – Mark Rotteveel Aug 31 '18 at 17:21
  • @MarkRotteveel Well, I have little time now for a deeper reasearch, but `%` and `_` are not regex things, they only used in wildcard patterns. These look like "expanded" wildcard patterns. – Wiktor Stribiżew Aug 31 '18 at 17:28
  • It does support alternation. If R1 and R2 are a valid expression, then so is (R1)|(R2). See the production rules in [its documentation](https://www.firebirdsql.org/file/documentation/reference_manuals/fblangref25-en/html/fblangref25-commons-predicates.html#fblangref25-commons-predsiimilarto) – Mark Rotteveel Aug 31 '18 at 17:29
  • The `%` is the equivalent of `.*` and `_` equivalent of `.`. This has been done because these already have a similar meaning in `LIKE`. Personally I'd have preferred if the SQL standards committee would have used the more common Perl-like syntax... – Mark Rotteveel Aug 31 '18 at 17:31
  • As alternation is supported, chaining multiple `similar to` using `or` is not necessary. – Mark Rotteveel Aug 31 '18 at 18:14