0

I have a dataset in which values are same except the number of semicolons in it resulting to different records.

For example if in a column one records has a;b;c and another record has a;;b;c, this is disabling the use of distinct function in my code. I want this to be treated as duplicate record for which ;; needs to be replaced with ;

How can we replace multiple ; with single ; in strings in a dataset in hive?

Vaibhav
  • 107
  • 1
  • 11

1 Answers1

0

You can use regexp_replace as defined in Hive UDFs

The first argument is the string that needs to get changed. So you can call it on your table like :

with t as 
   (SELECT "a\;\;\;b\;\;c\;d" as col ) 
SELECT regexp_replace(t.col, "\;+", "\;") as col from t

This should give you the output

+-------+
|    col|
+-------+
|a;b;c;d|
+-------+
philantrovert
  • 9,904
  • 3
  • 37
  • 61