I'm unable to read a simple test file on S3 from an interactive pig job flow (hadoop, elastic map reduce), and I'm not sure why.
I have two S3 buckets. Let's call them unmounted_bucket
, and mounted_bucket
. Both of these buckets were initially created through the AWS web interface (if it matters).
I have a EC2 linux instance running which has mounted_bucket mounted under /mnt/s3drive
.
I have a test file called threecolumntest.txt
that contains the following test data (it's actually tab delimited):
col1 col2 col3
one two three
four five six
seven eight nine
I have this file in both unmounted_bucket
and in mounted_bucket
. I uploaded it to each bucket through the AWS S3 web interface (management console).
From the interactive job flow (I'm using PuTTY), I can run these commands with no problem:
A = load 's3://unmounted_bucket/threecolumntest.txt' using PigStorage() as (c1: chararray, c2: chararray, c3: chararray);
illustrate A
Output is as expected.
However, if I run the same command, but pointed at the other bucket, I get an error.
A = load 's3://mounted_bucket/threecolumntest.txt' using PigStorage() as (c1: chararray, c2: chararray, c3: chararray);
illustrate A
[main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2997: Encountered IOException. Exception : Internal error creating job configuration.
I checked through the web interface and each bucket has the same permissions (as far as I can tell). This is definitely outside of my wheelhouse, so I am uncertain of what could be causing this, or what I should check next. I'm wondering if it has to do with one bucket being mounted (and if so, why?), since this is using the same file, and I uploaded the file to both buckets using the AWS web UI. The mounting piece seems to be the difference at this point. Perhaps I'm missing something else?