1

Good morning. I am looking for the equivalent in Hive of REGEXP_SUBSTR, which I use in other databases like Teradata.

My specific case is: I have a field that has a string. It looks something like this:

Row1: ABC ACC ADF AFA BAC CAF
Row2: ACC BAC CAG HOL HRR NOM
Row3: ACC ARR BBA CAF LOM NOM

In my WHERE clause, I am using RLIKE to look for specific parts of the string, like:

WHERE mystring RLIKE 'BAC|BBA'

What I'd like to do is also return whichever part matched as a column. So the result would be:

Row1: BAC
Row2: BAC
Row3: BBA

In other databases, I would do this by putting this in the SELECT statement: REGEXP_SUBSTR(mystring,'(BAC|BBA)\S*'). Is there an equivalent function in Hive? If so, what is the syntax?

I've looked at a few different Hive functions but they seem to expect the string to be in the same place each time, which is not the case here. But I admittedly could be misunderstanding the syntax.

leftjoin
  • 36,950
  • 8
  • 57
  • 116
lottidah
  • 11
  • 1
  • 3

1 Answers1

3

Use regexp_extract and double-slash for special chars like \s:

 select regexp_extract('ABC ACC ADF AFA BAC CAF','(BAC|BBA)\\s*');

Read more about syntax here: Hive Language Manual - StringFunctions

leftjoin
  • 36,950
  • 8
  • 57
  • 116