Import csv file into Qubole

Question

I am using qubole to run presto queries.

I need to upload a csv file into my query but cannot figure out how to do this.

Does anyone have any experience with this?

For more details, I am under the analyze section.

This is what I have so far based on @leftjoin's answer -

use adhoc;
create external table adhoc.test(
  Media_Buy_Key string,
  Day string,
  DSP_Publisher string,
  Final_Media_Cost string
)
row format delimited
fields terminated by ','
lines terminated by '\n'
location 's3://bucket/folder/folder/file.csv/';

I then run the hive query and it comes up as [Empty]

This is what my s3 bucket looks like:

Remove filename from table location. s3://bucket/folder/folder/ . Drop table and create once more with location fixed. Also check that field delimiter is comma in your file, fix accordingly. It can be some other character, specify it in the table DDL — leftjoin, Aug 27 '18 at 18:14
It still comes up as empty unfortunately. Added a screenshot of my s3 bucket as well. @leftjoin — nak5120, Aug 27 '18 at 18:23
Use qubole shell command to check table location. Execute hadoop fs -ls s3://bucket/folder/ should show your file — leftjoin, Aug 27 '18 at 18:30
still comes up empty unfortunately even after I find the file location — nak5120, Aug 27 '18 at 19:01
realized I didn't have access to the s3 bucket in qubole. Thanks for the help! — nak5120, Aug 27 '18 at 19:41

score 1 · Accepted Answer · answered Aug 27 '18 at 17:19

1

Presto uses Hive metastore to get table information and it's data location.

Upload file into some S3 location. Actually, S3 has no locations, they are emulated using filenames containing '/'. upload file using Qubole S3 interface. Say, into s3://your-bucket-name/your-location/yourfile.csv Location here is s3://your-bucket-name/your-location. If file is already in s3, you can copy it to new location using aws s3 cp command.
Using Hive create table on top of your file location.

use your_schema; create external table test( col1 string, col2 string, ... coln type ) row format delimited fields terminated by ',' lines terminated by '\n' location 's3://your-bucket-name/your-location/'; Check it works in Hive:

select * from your_schema.test limit 10;

Use Presto to query your table

select * from your_schema.test limit 10;

answered Aug 27 '18 at 17:19

leftjoin

36,950
8
57
116

Thanks! I was able to upload the data to the s3 bucket. When I tried running the the first hive query, `use your_schema` etc. I ran it without specifying columns. Just kept `test`. It resulted in an error saying - `log4j:WARN No such property [rollingPolicy] in org.apache.log4j.RollingFileAppender` Any idea why this may be? – nak5120 Aug 27 '18 at 17:46
@nak5120 This is not an error, this is warning (WARN). Skip it. Real error is something else. And columns should be specified in table DDL – leftjoin Aug 27 '18 at 17:57
Thanks, so I added the column names and was able to make the table. There was no data populated though it just comes up as [Empty]. – nak5120 Aug 27 '18 at 18:07
also showed what I did in the question to provide more info @leftjoin – nak5120 Aug 27 '18 at 18:10

Import csv file into Qubole

1 Answers1