I have a Hive script I'm running in EMR that is creating a partitioned Parquet table in S3 from a ~40GB gzipped CSV file also stored in S3.
The script runs fine for about 4 hours but reaches a point (pretty sure when it is just about done creating the Parquet table) where it errors out. The logs show that the error is:
HiveException: Hive Runtime Error while processing row
caused by:
AmazonS3Exception: Bad Request
There really isn't any more useful information in the logs that I can see. It is reading the CSV file fine from S3 and it creates a couple metadata files in S3 fine as well, so I've confirmed the instance has read/write permissions to the Bucket.
I really can't think of anything else that's going on and I wish there was more info in the logs about what "Bad Request" to S3 that Hive is making. Anyone have any ideas?