Unable to read S3 file from interactive pig job flow

Question

I'm unable to read a simple test file on S3 from an interactive pig job flow (hadoop, elastic map reduce), and I'm not sure why.

I have two S3 buckets. Let's call them unmounted_bucket, and mounted_bucket. Both of these buckets were initially created through the AWS web interface (if it matters).

I have a EC2 linux instance running which has mounted_bucket mounted under /mnt/s3drive.

I have a test file called threecolumntest.txt that contains the following test data (it's actually tab delimited):

col1 col2 col3
one  two  three
four five six
seven eight nine

I have this file in both unmounted_bucket and in mounted_bucket. I uploaded it to each bucket through the AWS S3 web interface (management console).

From the interactive job flow (I'm using PuTTY), I can run these commands with no problem:

A = load 's3://unmounted_bucket/threecolumntest.txt' using PigStorage() as (c1: chararray, c2: chararray, c3: chararray);

illustrate A

Output is as expected.

However, if I run the same command, but pointed at the other bucket, I get an error.

A = load 's3://mounted_bucket/threecolumntest.txt' using PigStorage() as (c1: chararray, c2: chararray, c3: chararray);

illustrate A

[main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2997: Encountered IOException. Exception : Internal error creating job configuration.

I checked through the web interface and each bucket has the same permissions (as far as I can tell). This is definitely outside of my wheelhouse, so I am uncertain of what could be causing this, or what I should check next. I'm wondering if it has to do with one bucket being mounted (and if so, why?), since this is using the same file, and I uploaded the file to both buckets using the AWS web UI. The mounting piece seems to be the difference at this point. Perhaps I'm missing something else?

score 1 · Answer 1 · answered May 06 '13 at 09:55

1

Try s3n:// instead of s3:// . s3n is s3 native file system, s3 is block file system (like hdfs, but doesn't allow you to simply read files simply stored in S3 by anyone except Hadoop's s3 protocol).

answered May 06 '13 at 09:55

SNeumann

119
2

Thanks for the reponse but I got the same error. I did another test (that I should have done before), but I get the same error if I point my script at a non existant file in `mounted_bucket`. Perhaps I can't get into the directory then? Unsure why, I believe I have the permissions set to everyone. – Dan May 07 '13 at 17:08

Unable to read S3 file from interactive pig job flow

1 Answers1