1

I am using qubole to run presto queries.

I need to upload a csv file into my query but cannot figure out how to do this.

Does anyone have any experience with this?

For more details, I am under the analyze section.

enter image description here

enter image description here

This is what I have so far based on @leftjoin's answer -

use adhoc;
create external table adhoc.test(
  Media_Buy_Key string,
  Day string,
  DSP_Publisher string,
  Final_Media_Cost string
)
row format delimited
fields terminated by ','
lines terminated by '\n'
location 's3://bucket/folder/folder/file.csv/';

I then run the hive query and it comes up as [Empty]

This is what my s3 bucket looks like: enter image description here

leftjoin
  • 36,950
  • 8
  • 57
  • 116
nak5120
  • 4,089
  • 4
  • 35
  • 94
  • Remove filename from table location. s3://bucket/folder/folder/ . Drop table and create once more with location fixed. Also check that field delimiter is comma in your file, fix accordingly. It can be some other character, specify it in the table DDL – leftjoin Aug 27 '18 at 18:14
  • It still comes up as empty unfortunately. Added a screenshot of my s3 bucket as well. @leftjoin – nak5120 Aug 27 '18 at 18:23
  • Use qubole shell command to check table location. Execute hadoop fs -ls s3://bucket/folder/ should show your file – leftjoin Aug 27 '18 at 18:30
  • still comes up empty unfortunately even after I find the file location – nak5120 Aug 27 '18 at 19:01
  • realized I didn't have access to the s3 bucket in qubole. Thanks for the help! – nak5120 Aug 27 '18 at 19:41

1 Answers1

1

Presto uses Hive metastore to get table information and it's data location.

  1. Upload file into some S3 location. Actually, S3 has no locations, they are emulated using filenames containing '/'. upload file using Qubole S3 interface. Say, into s3://your-bucket-name/your-location/yourfile.csv Location here is s3://your-bucket-name/your-location. If file is already in s3, you can copy it to new location using aws s3 cp command.

  2. Using Hive create table on top of your file location.

use your_schema; create external table test( col1 string, col2 string, ... coln type ) row format delimited fields terminated by ',' lines terminated by '\n' location 's3://your-bucket-name/your-location/'; Check it works in Hive:

select * from your_schema.test limit 10;
  1. Use Presto to query your table

select * from your_schema.test limit 10;

leftjoin
  • 36,950
  • 8
  • 57
  • 116
  • Thanks! I was able to upload the data to the s3 bucket. When I tried running the the first hive query, `use your_schema` etc. I ran it without specifying columns. Just kept `test`. It resulted in an error saying - `log4j:WARN No such property [rollingPolicy] in org.apache.log4j.RollingFileAppender` Any idea why this may be? – nak5120 Aug 27 '18 at 17:46
  • @nak5120 This is not an error, this is warning (WARN). Skip it. Real error is something else. And columns should be specified in table DDL – leftjoin Aug 27 '18 at 17:57
  • Thanks, so I added the column names and was able to make the table. There was no data populated though it just comes up as [Empty]. – nak5120 Aug 27 '18 at 18:07
  • also showed what I did in the question to provide more info @leftjoin – nak5120 Aug 27 '18 at 18:10