0

I have about 10 files in the same HDFS location. All files have the exact same columns (about 15) and each are about 100 rows. Each file represents data I have received on over the last 10 months (data is refreshed monthly). I would like to create one HIVE table that merges all of the data into the tables. The table should have 15 columns with about 1,000 rows of data.

I tried using code I usually use to create tables (please see below) but when I run the script below, it executes but only grabs data from one file but not the other 9.

CREATE EXTERNAL TABLE database.tablename (
UserID INT,
UserName String,
Department String,
State String
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
LOCATION '/location/of/the/file/'
TBLPROPERTIES ("skip.header.line.count"="1");

I don't receive any errors but I'm only getting some of the data, not all of it. Should I use completely different syntax? or can I edit the script above to get the results I need?

Any help is greatly appreciated! P.S. Very new to Hadoop/HIVE so I am trying to learn as I get hit with these different scenarios. Thank you all!

Rspktcod
  • 23
  • 6

1 Answers1

0

Could you once make sure all files are placed under '/location/of/the/file/' location.

If there are multiple directories inside the table pointing location('/location/of/the/file/'), then set these parameters in your current hive session and run the query:

SET hive.mapred.supports.subdirectories=TRUE;
SET mapred.input.dir.recursive=TRUE;
notNull
  • 30,258
  • 4
  • 35
  • 50
  • Hi Shu, Thanks for the response.I believe the answer in this case would be there aren't multiple directories. All 10 files are in the same '/location/of/the/file/' - "file" being the folder where all 10 files reside. Should I add the parameters you provided above? Or what other edits should I make to the original script? – Rspktcod Jun 10 '19 at 21:44