0

My csv output file has few rows that are split into two cells because data in Hive table has string entries with ; symbol which causes the split.


PROPER ROW
'xxx-xxxxxx','value','NULL','value','v','value','value','1.0','value','0.0','value',value,'value','value'

ODD ROW
'xxx-xxxxx','value','NULL','value','v','value','value','value','value','value','value','value','VERY LONG NAME; 

SECOND CELL OF ODD ROW
 ;VERY LONG NAME CONTINUED','VERY LONG NAME '

I tried to change output format and field size but not success. Row still gets split into two cells.

beeline -u jdbc:hive2://host:10000/default -n user -p xxxxx --silent=true --outputformat=csv -f sql_code.sql > output.csv

I want my rows not being split into multiple cells. This can be done by escaping ; symbol

'xxx-xxxxxx','value','NULL','value','v','value','value','1.0','value','0.0','value',value,'value','value'

'xxx-xxxxx','value','NULL','value','v','value','value','value','value','value','value','value','VERY LONG NAME VERY LONG NAME CONTINUED','value'

marcin2x4
  • 1,321
  • 2
  • 18
  • 44

2 Answers2

1

You can use the option "--outputformat=dsv".By default, the delimiter is a pipe (|). You can change it using the option --delimiterForDSV=','.

The next example works:

create table temp.test_name           (                                                                                                                                                                                     
first_name  string
,last_name  string
,zipcode   string
)
partitioned by (part string)
stored as parquet location '../temp.db/test_name' tblproperties("parquet.compression=SNAPPY")
;

insert into temp.test_name partition(part='A') values
('David','David','00')
,('Ellen', 'Ellen','00')
,('David','David','00')
,('David', 'VERY LONG NAME; VERY LONG NAME CONTINUED','00');

Beeline command that returns 4 rows, the same as the contained in the table:

beeline --color=true -u "jdbc..." --outputformat=dsv --delimiterForDSV=',' -f sql_code.hql > output.csv

For more information, check next documentation: https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients#HiveServer2Clients-Separated-ValueOutputFormats

0

The below query concats the string separated by comma to a single column in hive:

select col1, concat('"',col2,'"')col2, col3 from table;

  • As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Sep 04 '22 at 19:10