I'm trying to use Shark on EMR and I can't seem to be able to recover my partitions from a table with location set to an S3 bucket. I get nothing when i try to show my partitions.
shark> MSCK REPAIR TABLE logs ;
OK
Time taken: 1.79 seconds
shark> SHOW PARTITIONS logs ;
OK
Time taken: 0.073 seconds
I create my table like
SET hive.exec.dynamic.partition = true ;
SET hive.exec.dynamic.partition.mode = nonstrict ;
CREATE EXTERNAL TABLE IF NOT EXISTS logs (
time STRING,
thread STRING,
logger STRING,
identity STRING,
message STRING,
logtype STRING,
logsubtype STRING,
node STRING,
storageallocationstatus STRING,
nodelist STRING,
userid STRING,
nodeid STRING,
path STRING,
datablockid STRING,
hash STRING,
size STRING,
value STRING,
exception STRING,
server STRING,
app STRING,
version STRING
)
PARTITIONED BY (
dt STRING,
level STRING
)
ROW FORMAT
DELIMITED
FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION 's3://my-log/parsed-logs/' ;
My log bucket contains one log file located in s3://my-log/parsed-logs/dt=2014-01-03/level=ERROR/
.
The MSCK REPAIR TABLE logs
command should be equivalent to Amazons Hive extension ALTER TABLE logs RECOVER PARTITIONS
according to the Hive language manual but when I run the command I get no visible partitions. I tried the exact same thing in Hive with ALTER TABLE logs RECOVER PARTITIONS
and it worked like a charm.
hive> ALTER TABLE logs RECOVER PARTITIONS ;
OK
Time taken: 0.975 seconds
hive> SHOW PARTITIONS logs ;
OK
dt=2014-01-03/level=ERROR
Time taken: 0.078 seconds, Fetched: 1 row(s)
Am I missing something here when I'm using Shark?