Hive tables are not created atomatically when you upload data to HDFS. This is something that you have to do manually, or programmatically from your app. The command for creating (external) Hive tables is basically:
hive> create external table <table_name> (param_1_name param_1_type, ...) row format delimited fields terminated by ',' location '/user/<your_hdfs_user>/path/to/the/data/directory/';
The above is for structured data in CSV-like format. If the data is written in JSON, then you will need to use a serde.
Being said that, once the Hive tables are created a very easy way to add new data to the tables is to upload such a data into HDFS directly. This can be done through WebHDFS. For instance, if you want to add a file to the HDFS folder the Hive table is pointing to (using curl
as Http cient):
$ curl -i -X PUT "http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=CREATE
[&overwrite=<true |false>][&blocksize=<LONG>][&replication=<SHORT>]
[&permission=<OCTAL>][&buffersize=<INT>]"
You will receive a redirection that must be followed:
HTTP/1.1 307 TEMPORARY_REDIRECT
Location: http://<DATANODE>:<PORT>/webhdfs/v1/<PATH>?op=CREATE...
Content-Length: 0
Thus, perform the PUT on the redirection URL:
curl -i -X PUT -T <LOCAL_FILE> "http://<DATANODE>:<PORT>/webhdfs/v1/<PATH>?op=CREATE..."
(BTW, curl
can automatically follow the redirections if you use the -L
option).
Once the file is created, you can append new data to the already existent file by using a POST method (op=append
as stated in teh documentation).
HTH