-1

So all, I have been transform my data into hive with talend.

And I run a few of regex. One of those is like this.

KRW3TR.899877.GR0054656*DR.798012...2..............GR0054656*EUR*
KRW3TR.899877.GR0054656*DR.798012...2..............GR0054656*EUR*DDT*
KRW3TR.899877.GR0054656*DR.798012...2..............GR0054656*EUR*CCT*

What I am trying to do is get the last sequence: DDT CCT

(from those examples you know that the last sequence sometimes occur)

And I get the error from map reduce :

Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to execute method public java.lang.String org.apache.hadoop.hive.ql.udf.UDFRegExpExtract.evaluate(java.lang.String,java.lang.String,java.lang.Integer)  on object org.apache.hadoop.hive.ql.udf.UDFRegExpExtract@a22c4d8 of class org.apache.hadoop.hive.ql.udf.UDFRegExpExtract with arguments 

And the other is :

Caused by: java.lang.reflect.InvocationTargetException
Caused by: java.util.regex.PatternSyntaxException: Dangling meta character '*' near index 9

I use this regex to extract this:

REGEXP_EXTRACT(columnrr,'^(?:[^*]*\\*){3}([^*]*)',1) as TYPE

My questions are : Are they related? Is there any business with the occurance of DDT and CCT? How my regex should be?

Thank you.

1 Answers1

0

I found it. There is reserved character in regex. So the answer is:

REGEXP_EXTRACT(columnrr,'^(?:[^*]*\\\\*){3}([^*]*)',1) as TYPE

Related questions : java.util.regex.PatternSyntaxException: Dangling meta character '+' near index 0 +