1

I follow the instructions from the book titled Big Data Visualization,

see https://www.amazon.com/Big-Data-Visualization-James-Miller/dp/1785281941

Basically, the steps are:

a) Load in a huge text file into S3 directory /bigdatavizproject1/Input

b) Using AWS EMR to run a HiveSQL script using (Add Step)

c) Its output should be in the S3 directory /bigdatavizproject1/Output

See https://i.stack.imgur.com/lBnrH.png

The HiveSQL is as below:

CREATE TABLE thebigdatatable (logrecord VARCHAR(550)); LOAD DATA INPATH 's3://bigdatavizproject1/Input/weblog1 -2016_08_27_03.txt' INTO TABLE thebigdatatable; select substr(ltrim(rtrim(logrecord)), 20, 3) from thebigdatatable;

But I do not see any output from S3 output directory.

https://i.stack.imgur.com/Spp5o.png shows the status being Completed. When I click View jobs, there is nothing.

Any comments would be greatly appreciated. Thanks

THIAM HUAT Tan
  • 71
  • 1
  • 4
  • 9
  • 1
    Are you sure that's the whole HQL script? The last statement is just doing a select, I don't see where it would be storing any output to s3. Is it supposed to be a "create table as select" instead? – Brian R Armstrong Mar 05 '19 at 17:43
  • CREATE TABLE thebigdatatable (logrecord VARCHAR(550)); LOAD DATA INPATH 's3://bigdatavizproject1/Input/weblog1 -2016_08_27_03.txt' INTO TABLE thebigdatatable; select substr(ltrim(rtrim(logrecord)), 20, 3) from thebigdatatable; Above is what I get from the book. Do you mind if I ask: a) How do I view that HiveSQL table? b) what is the correct command to output to that S3 /output directory? Thanks. – THIAM HUAT Tan Mar 08 '19 at 03:55

0 Answers0