I have a string column description
in a hive table which may contain tab characters '\t'
, these characters are however messing some views when connecting hive to an external application.
is there a simple way to get rid of all tab characters in that column?. I could run a simple python program to do it, but I want to find a better solution for this.

- 555
- 2
- 7
- 34

- 781
- 4
- 10
- 16
5 Answers
regexp_replace
UDF performs my task. Below is the definition and usage from apache Wiki.
regexp_replace(string INITIAL_STRING, string PATTERN, string REPLACEMENT):
This returns the string resulting from replacing all substrings in INITIAL_STRING
that match the java regular expression syntax defined in PATTERN
with instances of REPLACEMENT
,
e.g.: regexp_replace("foobar", "oo|ar", "")
returns fb

- 3,541
- 10
- 32
- 43

- 2,366
- 1
- 14
- 5
Custom SerDe might be a way to do it. Or you could use some kind of mediation process with regex_replace:
create table tableB as
select
columnA
regexp_replace(description, '\\t', '') as description
from tableA
;

- 5,241
- 7
- 30
- 46
-
1this should be regexp_replace not regex_replace, it seems to be working so far – user1745713 Aug 07 '13 at 20:58
select translate(description,'\\t','') from myTable;
Translates the input string by replacing the characters present in the from string with the corresponding characters in the to string. This is similar to the translate function in PostgreSQL. If any of the parameters to this UDF are NULL, the result is NULL as well. (Available as of Hive 0.10.0, for string types)
Char/varchar support added as of Hive 0.14.0

- 179,855
- 19
- 132
- 245

- 91
- 1
- 1
-
Welcome to Stack Overflow! While this code snippet may solve the question, [including an explanation](//meta.stackexchange.com/questions/114762/explaining-entirely-code-based-answers) really helps to improve the quality of your post. Remember that you are answering the question for readers in the future, and those people might not know the reasons for your code suggestion. Please also try not to crowd your code with explanatory comments, this reduces the readability of both the code and the explanations! – Filnor May 04 '18 at 06:19
You can also use translate(). If the third argument is too short, the corresponding characters from the second argument are deleted. Unlike regexp_replace() you don't need to worry about special characters. Source code.

- 662
- 6
- 14
There is no OOTB feature at this moment which allows this. One way to achieve that could be to write a custom InputFormat and/or SerDe that will do this for you. You might this JIRA useful : https://issues.apache.org/jira/browse/HIVE-3751. (not related directly to your problem though).

- 34,076
- 8
- 57
- 79
-
I feel like this answer is outdated despite the JIRA not being closed – OneCricketeer Sep 13 '18 at 14:07