So all, I have been transform my data into hive with talend.
And I run a few of regex. One of those is like this.
KRW3TR.899877.GR0054656*DR.798012...2..............GR0054656*EUR*
KRW3TR.899877.GR0054656*DR.798012...2..............GR0054656*EUR*DDT*
KRW3TR.899877.GR0054656*DR.798012...2..............GR0054656*EUR*CCT*
What I am trying to do is get the last sequence: DDT
CCT
(from those examples you know that the last sequence sometimes occur)
And I get the error from map reduce :
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to execute method public java.lang.String org.apache.hadoop.hive.ql.udf.UDFRegExpExtract.evaluate(java.lang.String,java.lang.String,java.lang.Integer) on object org.apache.hadoop.hive.ql.udf.UDFRegExpExtract@a22c4d8 of class org.apache.hadoop.hive.ql.udf.UDFRegExpExtract with arguments
And the other is :
Caused by: java.lang.reflect.InvocationTargetException
Caused by: java.util.regex.PatternSyntaxException: Dangling meta character '*' near index 9
I use this regex to extract this:
REGEXP_EXTRACT(columnrr,'^(?:[^*]*\\*){3}([^*]*)',1) as TYPE
My questions are : Are they related? Is there any business with the occurance of DDT and CCT? How my regex should be?
Thank you.