2

I need create some tables in Hive and for this I want to insert the data in hdfs so that a hive table was created automatically.

I consider this example: hive table

I need this information stored in Hive. Could you tell me a example of how have I insert the data in HDFS for this?

Cristina
  • 161
  • 1
  • 1
  • 9

1 Answers1

1

Hive tables are not created atomatically when you upload data to HDFS. This is something that you have to do manually, or programmatically from your app. The command for creating (external) Hive tables is basically:

hive> create external table <table_name> (param_1_name param_1_type, ...) row format delimited fields terminated by ',' location '/user/<your_hdfs_user>/path/to/the/data/directory/';

The above is for structured data in CSV-like format. If the data is written in JSON, then you will need to use a serde.

Being said that, once the Hive tables are created a very easy way to add new data to the tables is to upload such a data into HDFS directly. This can be done through WebHDFS. For instance, if you want to add a file to the HDFS folder the Hive table is pointing to (using curl as Http cient):

$ curl -i -X PUT "http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=CREATE
                [&overwrite=<true |false>][&blocksize=<LONG>][&replication=<SHORT>]
                [&permission=<OCTAL>][&buffersize=<INT>]"

You will receive a redirection that must be followed:

HTTP/1.1 307 TEMPORARY_REDIRECT
Location: http://<DATANODE>:<PORT>/webhdfs/v1/<PATH>?op=CREATE...
Content-Length: 0 

Thus, perform the PUT on the redirection URL:

curl -i -X PUT -T <LOCAL_FILE> "http://<DATANODE>:<PORT>/webhdfs/v1/<PATH>?op=CREATE..."

(BTW, curl can automatically follow the redirections if you use the -L option).

Once the file is created, you can append new data to the already existent file by using a POST method (op=append as stated in teh documentation).

HTH

frb
  • 3,738
  • 2
  • 21
  • 51
  • Hello thanks for your answer ! but what about inserting data using Hive ? will each insert rewrite the whole table ? or just append the new line at the end of the file ? thanks in advance – Mehdi TAZI Mar 01 '16 at 11:18
  • AFAIK, Hive is ready for insertions from 0.14.0 version. Before that, my impression is Hive focused on exploiting data that was updated by appending to the underlying HDFS files. Apart from that, tables could be created from the results of a query. Please check: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-InsertingvaluesintotablesfromSQL – frb Mar 04 '16 at 15:05