0

I have a very basic question which is: How can I add a very simple table to Hive. My table is saved in a text file (.txt) which is saved in HDFS. I have tried to create an external table in Hive which points out this file but when I run an SQL query (select * from table_name) I don't get any output. Here is an example code:

create external table Data (
    dummy INT,
    account_number INT, 
    balance INT, 
    firstname STRING, 
    lastname STRING, 
    age INT, 
    gender CHAR(1), 
    address STRING, 
    employer STRING, 
    email STRING,
    city STRING, 
    state CHAR(2)
)
LOCATION 'hdfs:///KibTEst/Data.txt';

KibTEst/Data.txt is the path of the text file in HDFS.

The rows in the table are seperated by carriage return, and the columns are seperated by commas.

Thanks for your help!

Suvarna Pattayil
  • 5,136
  • 5
  • 32
  • 59
mshabeeb
  • 577
  • 2
  • 9
  • 25

4 Answers4

5
  1. You just need to create an external table pointing to your file location in hdfs and with delimiter properties as below:

    create external table Data (
        dummy INT,
        account_number INT, 
        balance INT, 
        firstname STRING, 
        lastname STRING, 
        age INT, 
        gender CHAR(1), 
        address STRING, 
        employer STRING, 
        email STRING,
        city STRING, 
        state CHAR(2)
    )
    ROW FORMAT DELIMITED
    FIELDS TERMINATED BY ',' 
    LINES TERMINATED BY '\n'
    LOCATION 'hdfs:///KibTEst/Data.txt';
    
  2. You need to run select query(because file is already in HDFS and external table directly fetches data from it when location is specified in create statement). So you test using below select statement:

SELECT * FROM Data;

Farooque
  • 3,616
  • 2
  • 29
  • 41
  • Using this code the table can be created successfully but when I run the SQL query it does not return any output (it only returns OK). This looks like the table in Hive is empty. I double-checked the table contents in HDFS by running `hadoop fs -cat /KibTest/Data.txt` which returns the table's contents. – mshabeeb Jul 01 '16 at 08:26
  • you update location as '/KibTest/Data.txt' then you will get the result, because if you run 'hadoop fs -cat hdfs:///KibTEst/Data.txt' you will not get anything... – Farooque Jul 01 '16 at 08:28
  • This gave the same result. However, removing the file name from the path (so that the location becomes `/KibTest/`) gets the job done although all files in the directory are inserted into the table! So I have to create a new directory and move only this file into it. Do you know if there is any way to limit the input to one file in the path? – mshabeeb Jul 01 '16 at 08:49
  • To limit to any particular path for single file in a directory, you have give the location upto the file name instead of directory name. – Farooque Jul 01 '16 at 08:52
  • Yes I understand that but as written in the previous comment when I include the file name in the path the result is an empty table where I can see the data of all files in that directory (including the one I need) when removing the file name from the path – mshabeeb Jul 01 '16 at 14:38
0
    create external table Data (
        dummy INT,
        account_number INT, 
        balance INT, 
        firstname STRING, 
        lastname STRING, 
        age INT, 
        gender CHAR(1), 
        address STRING, 
        employer STRING, 
        email STRING,
        city STRING, 
        state CHAR(2)
    )
    row format delimited    
    FIELDS TERMINATED BY ‘,’
    stored as textfile
    LOCATION 'Your hdfs location for external table';

If data in HDFS then use :

LOAD DATA INPATH 'hdfs_file_or_directory_path' INTO TABLE tablename

The use select * from table_name

Taha Naqvi
  • 1,756
  • 14
  • 24
  • Thanks a lot! By `Your hdfs location for external table` do you mean the location in HDFS (i.e. `hdfs:///KibTEst/Data.txt`)? What is the difference between this one and `hdfs_file_or_directory_path`? – mshabeeb Jul 01 '16 at 07:42
0
create external table Data (
        dummy INT,
        account_number INT, 
        balance INT, 
        firstname STRING, 
        lastname STRING, 
        age INT, 
        gender CHAR(1), 
        address STRING, 
        employer STRING, 
        email STRING,
        city STRING, 
        state CHAR(2)
    )
    row format delimited    
    FIELDS TERMINATED BY ','
    stored as textfile
    LOCATION '/Data';

Then load file into table

LOAD DATA INPATH '/KibTEst/Data.txt' INTO TABLE Data;

Then

select * from Data;
Kishore
  • 5,761
  • 5
  • 28
  • 53
  • Thanks for your answer! I have tried inputting the same command but I get the following error (this is one of th several options previously and I always got the same error): ParseException line 16:26 mismatched input ',' expecting StringLiteral near 'BY' in table row format's field separator – mshabeeb Jul 01 '16 at 07:33
  • Now the table can be created without errors but when I check it contents by running an SQL query `select * from Data` I get no output (only OK). Using the Load command does not work but gives the following error instead: **FAILED: SemanticException Line 1:17 Invalid path ''/KibTEst/Data.txt'': No files matching path hdfs://MACHINE_NAME/KibTEst/Data.txt** I have tried to change the path to `hdfs:///KibTEst/Data.txt` but it gives exactly the same error (with the same path). However I believe a load statement should not be necessary since the table is external and the data exist in HDFS – mshabeeb Jul 01 '16 at 08:32
  • what is KibTEst? where is your Data.txt placed?? – Kishore Jul 01 '16 at 09:31
  • KibTEst is the name of the folder in HDFS where my file (Data.txt) is placed. – mshabeeb Jul 01 '16 at 14:38
0

I hope, below inputs will try to answer the question asked by @mshabeen.

There are different ways that you can use to load data in Hive table that is created as external table. While creating the Hive external table you can either use the LOCATION option and specify the HDFS, S3 (in case of AWS) or File location, from where you want to load data OR you can use LOAD DATA INPATH option to load data from HDFS, S3 or File after creating the Hive table.

Alternatively you can also use ALTER TABLE command to load data in the Hive partitions.

Below are some details

  1. Using LOCATION - Used while creating the Hive table. In this case data is already loaded and available in Hive table.

  2. **LOAD DATA INPATH** option - This Hive command can be used to load data from specified location. Point to remember here is, the data will get MOVED from input path to Hive warehouse path. Example - LOAD DATA INPATH 'hdfs://cluster-ip/path/to/data/location/'

  3. Using ALTER TABLE command - Mostly this is used to add data from other locations into the Hive partitions. In this case it is required that all partitions are already defined and the values for the partitions are already known. In case of dynamic partitions this command is not required. Example - ALTER TABLE table_name ADD PARTITION (date_col='2018-02-21') LOCATION 'hdfs/path/to/location/' The above code will map the partition to the specified data location (in this case HDFS). However, the data will NOT MOVED to Hive internal warehouse location.

Additional details are available here

Avinash
  • 1,363
  • 2
  • 15
  • 28