1

How can I clean a column using Hive HQL containing this type of string

["A","B","C"]
["ABC","RFG","ERD","BAC"]

in order to get only the values inside " " separated by the ,?

Desired output

A,B,C
ABC,RFG,ERD,BAC

I've tried to use regexp_replace:

regexp_replace(s_agr.sig_tecnologia,'["\]\[]','')

but I'm receiving this error below:

Error: Error while compiling statement: FAILED: SemanticException [Error 10014]: Line 53:1 Wrong arguments '''': No matching method for class org.apache.hadoop.hive.ql.udf.UDFRegExpReplace with (array<string>, string, string). Possible choices: _FUNC_(string, string, string) (state=42000,code=10014)
Henrique Branco
  • 1,778
  • 1
  • 13
  • 40
  • Not sure if hive supports this syntax for the translate function, but try `select translate('["ABC","RFG","ERD","BAC"]','"[]','')` – Radagast Oct 19 '20 at 16:13

3 Answers3

3

How about replace()?

select replace(replace(replace(col, '"', ''), '[', ''), ']', '')
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
2

I've solved the problem with this question here.

The problem is because I was using collect_set to generate all those values together and they were of type array of strings not strings.

If I then use concat_ws(',',collect_set(column)) my problem is solved.

Henrique Branco
  • 1,778
  • 1
  • 13
  • 40
1

You can try the below -

select regexp_replace(s_agr.sig_tecnologia, '\\[\\"\\]', '')
Fahmi
  • 37,315
  • 5
  • 22
  • 31