1

I am new to Hive regex matching and struggling to find the right pattern for matching word boundaries:

haystack RLIKE concat('(?i)\b', 'needle', '\b')

doesn't return anything.

Sample values which I have in DB:

haystack
---------
needless to say
this is a needle
so many (needle)
these are needles

When I use haystack RLIKE concat('(?i)', 'needle'), it returns me all the rows but I am actually looking for this is a needle.

leftjoin
  • 36,950
  • 8
  • 57
  • 116
deGee
  • 781
  • 1
  • 16
  • 34
  • are you looking for a logic like - i should come first and then needle? Then you can use below logic - `where instr(col,'i') >0 AND instr(col,'needle') >0 AND instr(col,'i') < instr(col,'needle') ` – Koushik Roy Mar 02 '21 at 04:36

1 Answers1

1

In Hive use two backslashes: \\b

Demo:

with mytable as (
select stack(4,
'needless to say',
'this is a needle',
'so many (needle)',
'these are needles'
) as haystack
)

select haystack, haystack rlike concat('(?i)\\b', 'needle', '\\b') from mytable;

Result:

haystack             _c1
needless to say      false
this is a needle     true
so many (needle)     true
these are needles    false

Note that so many (needle) is also matched because ( and ) are not word characters.

leftjoin
  • 36,950
  • 8
  • 57
  • 116