I follow the instructions from the book titled Big Data Visualization,
see https://www.amazon.com/Big-Data-Visualization-James-Miller/dp/1785281941
Basically, the steps are:
a) Load in a huge text file into S3 directory /bigdatavizproject1/Input
b) Using AWS EMR to run a HiveSQL script using (Add Step)
c) Its output should be in the S3 directory /bigdatavizproject1/Output
See https://i.stack.imgur.com/lBnrH.png
The HiveSQL is as below:
CREATE TABLE thebigdatatable (logrecord VARCHAR(550)); LOAD DATA INPATH 's3://bigdatavizproject1/Input/weblog1 -2016_08_27_03.txt' INTO TABLE thebigdatatable; select substr(ltrim(rtrim(logrecord)), 20, 3) from thebigdatatable;
But I do not see any output from S3 output directory.
https://i.stack.imgur.com/Spp5o.png shows the status being Completed. When I click View jobs, there is nothing.
Any comments would be greatly appreciated. Thanks