0

I'm running an Amazon Elastic MapReduce (EMR) job using Pig. I'm having trouble importing the json or simplejson modules into my Python user defined function (UDF).

Here is my code:

#!/usr/bin/env python
import simplejson as json
@outputSchema('m:map[]')
def flattenJSON(text):
    j = json.loads(text)
    ...

When I try to register the function in Pig I get an error saying "No module named simplejson"

grunt> register 's3://chopperui-emr/code/flattenDict.py' using jython as flatten;
2015-05-31 16:53:43,041 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1121: Python Error. Traceback (most recent call last):
File "/tmp/pig6071834754384533869tmp/flattenDict.py", line 32, in <module>
import simplejson as json
ImportError: No module named simplejson

However, my Amazon AMI includes Python 2.6, which includes json as a standard package (using import json doesn't work either). Also, if I try to install simplejson using pip it says it's already installed (on both master and core nodes).

[hadoop@ip-172-31-46-71 ~]$ pip install simplejson
Requirement already satisfied (use --upgrade to upgrade): simplejson in /usr/local/lib64/python2.6/site-packages

Also, it works fine if I run python interactively from the command line on the master node

[hadoop@ip-172-31-46-71 ~]$ python
Python 2.6.9 (unknown, Apr  1 2015, 18:16:00) 
[GCC 4.8.2 20140120 (Red Hat 4.8.2-16)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import json
>>> 

There must be something different about how EMR or Pig is setting up the Python environment, but what?

mostlyjason
  • 138
  • 4

1 Answers1

1

Pig UDF uses jython, which does not work with simplejson.

You can try: Jyson as Json parser

FtoTheZ
  • 386
  • 3
  • 8