11

What would be the right way to write a Hive query with multiple LIKE operators like this:

SELECT * 
FROM some_table
WHERE
some_col LIKE '%abc%'
OR
some_col LIKE '%xyz%'
OR
some_col LIKE '%pqr%'
OR
... (some more LIKE statements)

I tried doing the above as well as

WHERE some_col LIKE '%abc|pqr|xyz%' 

but they didn't return any results. It works fine if I write separate queries, i.e.

WHERE some_col LIKE '%abc%' -> returns results

and

WHERE some_col LIKE '%pqr%' -> also returns results
Shobit
  • 776
  • 1
  • 6
  • 20
  • 1
    There is no good reason for Hive to fail at processing multiple LIKE expressions combined by OR. Could you elaborate on the table type (external vs. managed, partitioned or not, Text vs. RCFile vs. Orc vs. whatever) and the column definition (a plain String? any Null values in there?) – Samson Scharfrichter Nov 10 '15 at 17:33
  • `WHERE some_col rlike 'abc|pqr|xyz'` works. – Zahra Sep 22 '17 at 18:08

5 Answers5

20

From the docs:

A RLIKE B

NULL if A or B is NULL, TRUE if any (possibly empty) substring of A matches the Java regular expression B, otherwise FALSE. For example, 'foobar' RLIKE 'foo' evaluates to TRUE and so does 'foobar' RLIKE '^f.*r$'.


A REGEXP B

Same as RLIKE.

So, use

WHERE some_col RLIKE 'abc|pqr|xyz' 
mattinbits
  • 10,370
  • 1
  • 26
  • 35
13

You can probably use rlike(regular_expression).

WHERE some_col RLIKE '*abc*|*pqr*|*xyz*' 
Shivanand Pawar
  • 547
  • 4
  • 16
  • 1
    This is what I tried but did not get results. Are you suggesting that I try WHERE some_col RLIKE '%abc|pqr|xyz%' – Shobit Nov 10 '15 at 09:46
  • 1
    Yeah, try with rlike for regular expression. Sorry for the typing error. I have edited the answer. – Shivanand Pawar Nov 10 '15 at 10:26
  • 3
    Does not work for me. I think it should be: WHERE some_col RLIKE 'abc|pqr|xyz' – streof Aug 11 '16 at 14:18
  • Thanks for the pointer. Earlier I was trying `column RLIKE '*abc*' OR column RLIKE '*pqr*' OR column RLIKE '*xyz*'` But this was very heavy on processing. However if I put as mentioned int eh solution, it is working fine. Any explanation for this will be a help? – SrinR Nov 03 '16 at 13:24
4

I believe that the issue might be you need to group the like statement. Your Example done:

SELECT * 
FROM some_table
WHERE
(some_col LIKE '%abc%'
OR
some_col LIKE '%xyz%'
OR
some_col LIKE '%pqr%')
BalaramRaju
  • 439
  • 2
  • 8
1

You can try to use UNION if you have to use multiple condition in LIKE like this:

SELECT * FROM some_table WHERE some_col LIKE '%abc%'
UNION
SELECT * FROM some_table WHERE some_col LIKE '%xyz%'
UNION
SELECT * FROM some_table WHERE some_col LIKE '%pqr%'
Rahul Tripathi
  • 168,305
  • 31
  • 280
  • 331
  • OK, but that would probably require 4x more resources than a single query with ORs -- and if Hive fails at something as simple as boolean expressions, then it can fail at anything :-/ – Samson Scharfrichter Nov 10 '15 at 17:42
0

If all you need is to check for a list of specific substrings, you could use a different approach e.g.

WHERE InStr(some_col, 'abc') +InStr(some_col, 'pqr') +... >0

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-StringFunctions

Samson Scharfrichter
  • 8,884
  • 1
  • 17
  • 36