Run Pig with Lipstick on AWS EMR

Question

I'm running an AWS EMR Pig job using script-runner.jar as described here: http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-hadoop-script.html

Now, I want to hook up Netflix' Lipstick to monitor my scripts. I set up the server, and in the wiki here: https://github.com/Netflix/Lipstick/wiki/Getting-Started I can't quite figure out how to do the last step:

hadoop jar lipstick-console-[version].jar -Dlipstick.server.url=http://$LIPSTICK_URL

Should I substitute script-runner.jar with this?

Also, after following the build process in wiki I ended up with 3 different console jars:

lipstick-console-0.6-SNAPSHOT.jar
lipstick-console-0.6-SNAPSHOT-withHadoop.jar
lipstick-console-0.6-SNAPSHOT-withPig.jar

What is the purpose of the latter two jars?

UPDATE:

I think I'm making progress, but it still does not seem to work.

I set the pig.notification.listener parameter as described here and lipstick server url. There is more than one way to do it in EMR. Since I am using ruby API, I had to specify a step

hadoop_jar_step:
  jar: 's3://elasticmapreduce/libs/script-runner/script-runner.jar'
  properties: 
    - pig.notification.listener.arg: com.netflix.lipstick.listeners.LipstickPPNL
    - lipstick.server.url: http://pig_server_url

Next, I added lipstick-console-0.6-SNAPSHOT.jar to hadoop classpath. For this, I had to create a bootstrap action as follows:

bootstrap_actions:
  - name: copy_lipstick_jar
    script_bootstrap_action:
      path: #s3 path to bootstrap_lipstick.sh

where contents of bootstrap_lipstick.sh is

#!/bin/bash                                                                                                                                                  
hadoop fs -copyToLocal s3n://wp-data-west-2/load_code/java/lipstick-console-0.6-SNAPSHOT.jar /home/hadoop/lib/

The bootstrap action copies the lipstick jar to cluster nodes, and /home/hadoop/lib/ is already in hadoop classpath (EMR takes care of that).

It still does not work, but I think I am missing something really minor ... Any ideas appreciated.

Thanks!

score 2 · Answer 1 · answered Oct 20 '14 at 18:22

Currently Lipstick's Main class is a drop-in replacement to Pig's Main class. This is a hack (and far from ideal) to have access to the logical and physical plans for your script before and after optimization that are simply not accessible otherwise. As such it unfortunately won't work to just register the LipstickPPNL class as a PPNL for Pig. You've got to run Lipstick Main as though it was Pig.

I have not tried to run lipstick on EMR but it looks like you're going to need to use a custom jar step, not a script step. See the docs here: http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-launch-custom-jar-cli.html

The jar name would be the lipstick-console-0.6-SNAPSHOT-withHadoop.jar. It contains all the necessary dependencies to run Lipstick. Additionally the lipstick.server.url will need to be set.

Alternatively, you might take a look at https://www.mortardata.com/ which runs on EMR and has lipstick integration built-in.

Run Pig with Lipstick on AWS EMR

1 Answers1