I'm running a relatively big MR job using Amazon Elastic Map Reduce.
I ran the job plenty of times on small data sets with no problem.
But when trying to run it on a large dataset I'm getting the following exception:
Error: com.amazonaws.AmazonClientException: Unable to verify integrity of data download. Client calculated content length didn't match content length received from Amazon S3. The data may be corrupt.
I googled it and the only recommendation I got was to set the following:
System.setProperty("com.amazonaws.services.s3.disableGetObjectMD5Validation","true");
That didn't help at all.
I'm using replication 3, 11 M1Large datanodes and 1 M1Medium master node.
Any workaround or known fix for this issue?