1

I'm running a pig script on EMR that reads data stored in Avro format. It had been working locally, but to get other parts of the script to work on EMR, I had to revert the piggybank.jar I was using to 0.9.2 instead of 0.10.0. After making that change, AvroStorage silently fails to read any data and just returns zero records. Nothing mentioned in logs or anything. Here's the script:

REGISTER ../../../lib/avro-1.7.0.jar                                                                    
REGISTER ../../../lib/json-simple-1.1.1.jar                                                             
REGISTER ../../../lib/jackson-core-asl-1.5.2.jar                                                        
REGISTER ../../../lib/jackson-mapper-asl-1.5.2.jar                                                      
REGISTER ../../../lib/piggybank.jar                                                                     
a = LOAD '/data/' USING org.apache.pig.piggybank.storage.avro.AvroStorage();
DUMP a;

And again, if piggybank.jar is version 0.10.0, it works. If it is version 0.9.2, it does not. Should I be using a different version of any of the other libraries? I tried with avro-1.5.3.jar, and that also did not work.

Anothr note: if I do describe a; it correctly outputs the schema.

Joe K
  • 18,204
  • 2
  • 36
  • 58

2 Answers2

0

You probably have considered this already - but it might be quicker if you change the parts of your pig script that are dependent on 0.9.2 to work on 0.1.0.

seedhead
  • 3,655
  • 4
  • 32
  • 38
0

Not sure if this is still an issue for you, but a set of registers I use is:

REGISTER s3://..path../lib/piggybank-0.10.0.jar;
REGISTER file:/home/hadoop/lib/pig/piggybank.jar;
REGISTER s3://..path../lib/avro-1.7.1.jar;
REGISTER s3://..path../lib/jackson-core-2.0.6.jar;
REGISTER s3://..path../lib/jackson-mapper-lgpl-1.9.9.jar;
REGISTER s3://..path../lib/json-simple-1.1.1.jar;
REGISTER s3://..path../lib/joda-time-2.1.jar;
REGISTER s3://..path../lib/snappy-java-1.0.4.1.jar

You can stack both piggybanks on top of each other. There's some weirdness with how the piggybank-0.10.0 jar plays with the piggybank jar - it seems to be order-sensitive, but hopefully this helps, or at least gives you something else to try.

awshepard
  • 2,627
  • 1
  • 19
  • 24