1

I need to import data from a public s3 bucket which url is shared with me. how to load the data into hive table? I have tried below command but its not working:

create external table airlines_info (.... ) row format 
delimited fields terminated by '|' lines terminated by '\n'
stored as textfile location 'https://ml-cloud-dataset.....*.txt';

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:ml-cloud-dataset.s3.amazonaws.com/Airlines_data.txt is not a directory or unable to create one)

I am very new to hive and I am not sure about the code. I also tried below code after creating the table to load the data into hive table but that's also not working

load data inpath 'https://ml-cloud-dataset.....*.txt' into table airlines_info;
leftjoin
  • 36,950
  • 8
  • 57
  • 116
  • not working... what's the error message? – mck Jan 02 '21 at 13:43
  • for the first code i got following error msg FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:https://ml-cloud-dataset.s3.amazonaws.com/Airlines_data.txt is not a directory or unable to create one) – BHAWNA GUPTA Jan 02 '21 at 13:45

1 Answers1

0

Table location should be directory in HDFS or S3, not file and not https link.

Download file manually, put into local filesystem and if you already have the table created then use

load data local inpath  'local_path_to_file' into table airlines_info;

If you do not have the table yet, create it and specify some location inside your S3, or alternatively create MANAGED table (remove EXTERNAL from your DDL), without location specified, it will create location for you, check location using DESCRIBE FORMATTED command, later you can convert table to EXTERNAL if necessary using ALTER TABLE airlines_info SET TBLPROPERTIES('EXTERNAL'='TRUE');

Instead of load data command you can simply copy file into table location using AWS CLI (provide correct local path and table directory S3 URL):

aws s3 cp C:\Users\My_user\Downloads\Airlines_data.txt s3://mybucket/path/airlines_info/
leftjoin
  • 36,950
  • 8
  • 57
  • 116