3

I'm having issue while downloading data from hive via beeline.

I want to download a table into file with csv as field delimiter. If the column value has comma, then that value needs to be enclosed by quote.

I know that beeline provides outputformat csv2 for this use case.

But I'm not getting the expected results out.

I've created sample table with data for this purpose.

    0: jdbc:hive2://ip:10000/default> DESC samp_ret;
+-------------+------------+----------+--+
|  col_name   | data_type  | comment  |
+-------------+------------+----------+--+
| intvar      | int        |          |
| stringvar1  | string     |          |
| stringvar2  | string     |          |
+-------------+------------+----------+--+
3 rows selected (0.081 seconds)
0: jdbc:hive2://ip:10000/default>  SELECT * FROM samp_ret ;
+------------------+----------------------+----------------------+--+
| samp_ret.intvar  | samp_ret.stringvar1  | samp_ret.stringvar2  |
+------------------+----------------------+----------------------+--+
| 1                | one,ONE              | One                  |
| 2                | 2                    | Two                  |
| 3                |                      | Three                |
| 4                | four,FOUR            |                      |
| 5                | five'FIVE            | Five                 |
| 6                | six',SIX             | Six                  |
+------------------+----------------------+----------------------+--+
6 rows selected (0.164 seconds)

I'm trying to download this data into app server, but for some reason, I'm not seeing the records containing comma enclosed in quote.

Beeline command used is ,

beeline -u jdbc:hive2://ip:10000/default?zeroDateTimeBehavior=round -n yumecorp yumehive org.apache.hive.jdbc.HiveDriver --showHeader=false --outputformat=csv2 --silent -e "SELECT * FROM dev.samp_ret " > /tmp/samp_ret.out

Output is ,

enter image description here

As you can see in the output, If the column value has comma, then the it's displayed as ^@

Let me know if I'm missing something here.

notNull
  • 30,258
  • 4
  • 35
  • 50
Makubex
  • 419
  • 3
  • 19

1 Answers1

3

Export the below property in the shell before executing your beeline command

bash$ export HADOOP_CLIENT_OPTS="-Ddisable.quoting.for.sv=false"

Then execute the beeline export command

beeline -u jdbc:hive2://ip:10000/default?zeroDateTimeBehavior=round -n yumecorp yumehive org.apache.hive.jdbc.HiveDriver --showHeader=false --outputformat=csv2 --silent -e "SELECT * FROM dev.samp_ret " > /tmp/samp_ret.out

Example:

I have 2 string columns in a tables after

hive> select * from i;
+-------+---------+--+
| i.id  | i.name  |
+-------+---------+--+
| 1     | kl,kl   |
| 2     | ll,mm   |
+-------+---------+--+

Now exporting to csv file with enclosed "" for name column values

bash$ export HADOOP_CLIENT_OPTS="-Ddisable.quoting.for.sv=false"
bash$ beeline --showHeader=false --outputformat=csv2 --silent -e "SELECT * FROM default.i" > samp_ret.out
bash$ cat samp_ret.out 
1,"kl,kl"
2,"ll,mm"

Refer to this hive Jira for more details.

notNull
  • 30,258
  • 4
  • 35
  • 50
  • Thanks. I initially tried it along the beeline command which didn't seem to work. It makes sense now. – Makubex Oct 03 '18 at 08:25