1

I have installed RStudio 3.1 on Horton Hadoop.

Currently my Hadoop Streaming env variable is set using this path export HADOOP_STREAMING=/usr/lib/hadoop-mapreduce/hadoop-streaming.jar

I get the error when executing a simple mapreduce using RStudio

Error in hadoop.streaming() : Please make sure that the env. variable HADOOP_STREAMING is set

Can anybody tell me what is the correct path for hadoop-streaming jar file ? Thanks.

Tyrone Williams
  • 77
  • 1
  • 10

1 Answers1

2

It depends on where do you have your hadoop libraries installed, for instance if you're using the cloudera distribution, you can use the following inside R:

Sys.setenv(HADOOP_STREAMING = "/opt/cloudera/parcels/CDH-5.1.3-1.cdh5.1.3.p0.12/lib/hadoop-mapreduce/hadoop-streaming.jar")
Theofilos Papapanagiotou
  • 5,133
  • 1
  • 18
  • 24