1

Is there a way to match zz-10% in find_in_set?

For example:

select find_in_set('zz-1000','zz-10%,zz-2000,zz-3000');

This should return 1 but Impala doesn't support it. I am wondering if there is some trick with Regex to workaround? The find_in_set seems to do only exact match.

Ideally this should return 1 only as I want to avoid hardcoding a bunch of zz-10% variations.

This is the definition of this function from https://www.cloudera.com/documentation/enterprise/5-14-x/topics/impala_string_functions.html

find_in_set(string str, string strList)

Purpose: Returns the position (starting from 1) of the first occurrence of a specified string within a comma-separated string. Returns NULL if either argument is NULL, 0 if the search string is not found, or 0 if the search string contains a comma. Return type: int

I cannot change zz-1000 (the first param) because it's basically a Column. I could do a bunch of IF / CASE WHEN though if there is a way.

Thanks.

UPDATE 1

I tried this:

select find_in_set('zz-1000','zz-10\d+,zz-2000,zz-3000');

And got this:

+----------------------------------------------------+
| find_in_set('zz-1000', 'zz-10\d+,zz-2000,zz-3000') |
+----------------------------------------------------+
| 0                                                  |
+----------------------------------------------------+

So that doesn't work either.

HP.
  • 19,226
  • 53
  • 154
  • 253
  • I'm unfamiliar with impala. Is `%` just a wildcard? If so, you could use `zz-10\d+` meaning zz-10 followed by one or more numerical characters. – emsimpson92 Jul 09 '18 at 23:04
  • It doesn't seem to work. I had UPDATE 1 posted. – HP. Jul 09 '18 at 23:08
  • that's because you mixed them up. Try `find_in_set('zz-10\d+', 'zz-1000,zz-200,zz-3000')` This will only work if find_in_set supports regex patterns though... – emsimpson92 Jul 09 '18 at 23:09
  • From what I've seen online it's not looking like it's supported. [Maybe this question can help you](https://stackoverflow.com/questions/21130690/mysql-usage-of-regexp-instead-of-find-in-set) – emsimpson92 Jul 09 '18 at 23:11
  • That doesn't work either :( I am thinking a workaround by search for `%` chars in the second param and manually remove it. Then "cut" the length of the search string?! – HP. Jul 09 '18 at 23:15
  • try checking out that other question I linked – emsimpson92 Jul 09 '18 at 23:16
  • That gave me False. But even if it works, I need to get the Index though. – HP. Jul 09 '18 at 23:33

1 Answers1

1

What about to use REGEXP_LIKE function:

+----------------------------------------------+
| regexp_like('zz-1000', 'zz-10\\d+$|zz-2000') |
+----------------------------------------------+
| true                                         |
+----------------------------------------------+

When you have a static number of strings to compare, we can try this:

SELECT CASE
   WHEN regexp_like('zz-1000', 'zz-10\\d+$') THEN 1
   WHEN regexp_like('zz-1000', 'zz-2000')    THEN 2
   ELSE 0
   END;
Oleksandr Yarushevskyi
  • 2,789
  • 2
  • 17
  • 24
  • Sure but how would I make it to give me the Index return? – HP. Jul 10 '18 at 18:31
  • @HP I updated the answer with a solution using CASE expression. Hope, it will work for you. – Oleksandr Yarushevskyi Jul 10 '18 at 20:01
  • That might work but the number of items in this list `'zz-10%,zz-2000,zz-3000'` could go to a hundred... – HP. Jul 10 '18 at 22:59
  • In Impala, I am wondering if there is a way to split the comma delimited string into rows and do regexp_like on each row? Then that would be easier than using `find_in_set` – HP. Jul 10 '18 at 23:00