57

When load data from HDFS to Hive, using

LOAD DATA INPATH 'hdfs_file' INTO TABLE tablename;

command, it looks like it is moving the hdfs_file to hive/warehouse dir. Is it possible (How?) to copy it instead of moving it, in order, for the file, to be used by another process.

Athafoud
  • 2,898
  • 3
  • 40
  • 58
Suge
  • 2,808
  • 3
  • 48
  • 79

3 Answers3

91

from your question I assume that you already have your data in hdfs. So you don't need to LOAD DATA, which moves the files to the default hive location /user/hive/warehouse. You can simply define the table using the externalkeyword, which leaves the files in place, but creates the table definition in the hive metastore. See here: Create Table DDL eg.:

create external table table_name (
  id int,
  myfields string
)
location '/my/location/in/hdfs';

Please note that the format you use might differ from the default (as mentioned by JigneshRawal in the comments). You can use your own delimiter, for example when using Sqoop:

row format delimited fields terminated by ','
Dag
  • 10,079
  • 8
  • 51
  • 74
  • DAG, I have a question on this. When Hive Table is created using the "Create Table" Comment the File gets moved from the HDFS location to the '/User/Hive/warehouse/Table/' . Is this new location is also located on HDFS? or is it just a local directory. i Assume it should also be a HDFS location? am i right? – Ragav May 22 '14 at 19:05
  • When you execute a simple `create table` no data is moved, and the table is created in HDFS in the warehouse directory. AFAIK the data is moved inside HDFS when a LOCATION is given, but no `external` keyword was used. (But I could be wrong, please recheck it yourself) – Dag May 23 '14 at 09:56
  • 5
    @Dag I tried the above way of creating the table using the keyword 'EXTERNAL' but for the first time it created the table with all the values as NULL. The following lines need to be added while creating the table schema. ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' . Please rectify me if i am wrong. – Jignesh Rawal Jul 19 '15 at 07:44
  • @JigneshRawal that's a very specific situation which should not be applied to the Dag's solution. You should also not be using that statement in the schema creation but rather the table creation whether external or managed – mlegge Oct 27 '15 at 19:32
  • 1
    @Dag, I would consider adding what Jignesh has mentioned, since this seems to be the default storage format that is used in many `Sqoop` tutorials. You could add it as an extra caveat. – Brian Feb 09 '16 at 20:09
  • @BrianVanover Thank you for the suggetion +1, I added a note to the answer. – Dag Feb 10 '16 at 13:28
  • I get the error `Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:hdfs://path_to_filename.txt is not a directory or unable to create one)`. I am trying to load content from that txt file to newly created table and without moving/deleting that txt file. – Bikash Gyawali May 26 '19 at 20:51
5

I found that, when you use EXTERNAL TABLE and LOCATION together, Hive creates table and initially no data will present (assuming your data location is different from the Hive 'LOCATION').

When you use 'LOAD DATA INPATH' command, the data get MOVED (instead of copy) from data location to location that you specified while creating Hive table.

If location is not given when you create Hive table, it uses internal Hive warehouse location and data will get moved from your source data location to internal Hive data warehouse location (i.e. /user/hive/warehouse/).

Avinash
  • 1,363
  • 2
  • 15
  • 28
4

An alternative to 'LOAD DATA' is available in which the data will not be moved from your existing source location to hive data warehouse location.

You can use ALTER TABLE command with 'LOCATION' option. Here is below required command

ALTER TABLE table_name ADD PARTITION (date_col='2017-02-07') LOCATION 'hdfs/path/to/location/'

The only condition here is, the location should be a directory instead of file.

Hope this will solve the problem.

Avinash
  • 1,363
  • 2
  • 15
  • 28