3

I am connected to an AWS EMR v5.4.0 instance over SSH and I want to call s3distcp. This link demonstrates how to setup an emr step to call it, but when I run it I get the following error:

Container launch failed for container_1492469375740_0001_01_000002 : org.apache.hadoop.yarn.exceptions.InvalidAuxServiceException: The auxService:mapreduce_shuffle does not exist
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:168)
    at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)
    at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:155)
    at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:390)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

I followed the instructions here but it still didn't work.

Community
  • 1
  • 1
Mark J Miller
  • 4,751
  • 5
  • 44
  • 74

1 Answers1

2

It turns out I needed to restart the yarn nodemanager service after configuring mapreduce_shuffle:

$ initctl list | grep yarn
hadoop-yarn-resourcemanager start/running, process 1256
hadoop-yarn-proxyserver start/running, process 702
hadoop-yarn-nodemanager start/running, process 896
$ sudo stop hadoop-yarn-nodemanager
$ sudo start hadoop-yarn-nodemanager

Also, in case it helps the yarn-site.xml file was located at: /etc/hadoop/conf/yarn-site.xml. It already had an entry for yarn.nodemanager.aux-services but mapreduce_shuffle wasn't configured:

<property>
  <name>yarn.nodemanager.aux-services</name>
  <value>spark_shuffle,</value>
</property>

<property>
  <name>yarn.nodemanager.aux-services.spark_shuffle.class</name>
  <value>org.apache.spark.network.yarn.YarnShuffleService</value>
</property>

So I added it like this:

<property>
  <name>yarn.nodemanager.aux-services</name>
  <value>spark_shuffle,mapreduce_shuffle</value>
</property>

<property>
  <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
  <value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>

<property>
  <name>yarn.nodemanager.aux-services.spark_shuffle.class</name>
  <value>org.apache.spark.network.yarn.YarnShuffleService</value>
</property>
Mark J Miller
  • 4,751
  • 5
  • 44
  • 74
  • I wonder how you would do this though if you're spinning up an EMR cluster on the fly using something like AWS Data Pipeline or Apache Airflow because you can't go into the Emr cluster and modify it like this when you're spinning it up, running an EMR activity, then tearing it down. Not sure how I'm gonna fix this error in my scenario... – Kyle Bridenstine May 21 '18 at 17:44