Questions tagged [rhadoop]

RHadoop is combination of R and Hadoop to manage and analyze data with Hadoop

RHadoop is a collection of three R packages that allow users to manage and analyze data with Hadoop. The packages have been implemented and tested in Cloudera's distribution of Hadoop (CDH3) & (CDH4). and R 2.15.0. THe packages have also been tested with Revolution R 4.3, 5.0, and 6.0. For rmr see Compatibility.

Source: Github: Revolution Analytics (RHadoop)

112 questions
1
vote
0 answers

Calling mapreduce from Shiny server

I am trying to parallelize my program using RHadoop. I am using shiny server to plot my data after using mapreduce from rmr library. The mapreduce script works fine in R and I am facing issues while calling the mapreduce script from shinyserver. My…
rrmum
  • 11
  • 2
1
vote
1 answer

Configuring the env. variable HADOOP_STREAMING for RStudio

I have installed RStudio 3.1 on Horton Hadoop. Currently my Hadoop Streaming env variable is set using this path export HADOOP_STREAMING=/usr/lib/hadoop-mapreduce/hadoop-streaming.jar I get the error when executing a simple mapreduce using…
1
vote
2 answers

How do you change the max container capability in Hadoop cluster

I installed RHADOOP on a HORTONWORKS SANDBOX, following these instructions: http://www.research.janahang.com/install-rhadoop-on-hortonworks-hdp-2-0/ Everything seems to have installed correctly. But when I run the test script at the bottom I get an…
user3357415
  • 94
  • 1
  • 10
1
vote
0 answers

How to read HDFS file as input matrix - getting error "Error in FUN(X[[2L]], ...) : Sorry, parameter type `NA' is ambiguous or not supported."

When I am reading a HDFS file as input matrix for mapreduce function (within rmr2 package) in my R script, I am getting the below error. > r.file <- hdfs.file("hdfs://X.X.X.X:NNNN/somnath/merged_train/part-m-00000","r") > input =…
somnathchakrabarti
  • 3,026
  • 10
  • 69
  • 92
1
vote
0 answers

How to do dimension reduction on training data set using R mapreduce?

I am working with RHadoop rhdfs package to perform dimension reduction on a CSV input file with large number of columns. The output would be a selected subset of all columns. To make it simple, I am trying to take just the first 5 columns of the CSV…
somnathchakrabarti
  • 3,026
  • 10
  • 69
  • 92
1
vote
1 answer

using hdfs.file() gives an error: attempt to apply non-function

I have just installed rhdfs and wanted to check how does it works... I tried this below code: library(hdfs) mod <- 2 model <- hdfs.file(mod) I am facing an error: Error in hdfs.file(mod) : attempt to apply non-function Could anyone please help…
user3279174
  • 99
  • 3
  • 11
1
vote
2 answers

Error in mr(map = map, reduce = reduce, combine = combine, vectorized.reduce, : hadoop streaming failed with error code 1 Calls: mapreduce -> mr

I am running the below Rscript gdp.R #!/usr/bin/env Rscript Sys.getenv(c("HADOOP_HOME", "HADOOP_CMD", "HADOOP_STREAMING", "HADOOP_CONF_DIR")) library(rmr2) library(rhdfs) setwd("/root/somnath/GDP_data/") gdp <-…
somnathchakrabarti
  • 3,026
  • 10
  • 69
  • 92
1
vote
0 answers

Linear Regression Using RHadoop (Mapreduce)

I'm new to RHadoop and also to RMR... I had an requirement to write a Mapreduce job in R Mapreduce. I have tried writing, but while executing this, it gives an error. I'm trying to read the file from hdfs. I know how to do this in R: output <- …
user3782364
  • 95
  • 1
  • 2
  • 7
1
vote
1 answer

RHadoop Job failing on Single Node Ubuntu cluster

I am posting a similar question a second time because I believe I now have a far more precise view of the problem. Environment : Hadoop 2.2.0 running as a Single Node Cluster on an Ubuntu 14.04 laptop machine. RStudio version 0.98.507, R version…
Calcutta
  • 1,021
  • 3
  • 16
  • 36
1
vote
1 answer

hadoop streaming failed with error code 1..trying rmr(Rhadoop--R package) with Datastax cassandra

I need a clarification regarding rmr+rhdfs(Rhadoop) with Datastax cassandra(CFS). Currently all the functions in rhdfs and rmr(to.dfs(),from.dfs()) are working. But When I try to run mapreduce(), below error occurs: streaming command failed!.Hadoop…
Saranya
  • 11
  • 1
1
vote
4 answers

Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:1.6: run (pre-dist) on project hadoop-project-dist:

I need help as i am trying to figure this out from last 2-3 days.. I am setting up Hadoop on Windows-7 (64-bit) machine. This is to try out the integration of R with Hadoop. I followed instructions for Hadoop installation as given in the URL -…
user3305610
  • 11
  • 1
  • 1
  • 2
1
vote
2 answers

Installing rmr2 in RHadoop

Can you please help me in fixing an Issue of Installing rmr2. I am new in RHadoop. R version 3.0.2 downloaded rmr2_2.3.0.tar.gz on root Please check : install.packages("rmr2_2.3.0.tar.gz") Installing package into ‘/usr/lib64/R/library’ (as…
user3176378
  • 23
  • 1
  • 4
1
vote
1 answer

Hadoop streaming fails in R

I am running the sample script of RHadoop to test out the system and using the following…
LonelySoul
  • 1,212
  • 5
  • 18
  • 45
1
vote
2 answers

Failed to remotely execute R script which loads library "rhdfs"

I'm working on a project using R-Hadoop, and got this problem. I'm using JSch in JAVA to ssh to remote hadoop pseudo-cluster, and here are part of Java code to create connection. /* Create a connection instance */ Connection conn = new…
Hao Huang
  • 221
  • 4
  • 16
1
vote
2 answers

RHive not working with CDH4

Has anyone tried to make RHive work with cdh4? Is it compatible with cdh4? I have tried asking this question on their google group but no answers yet! I have installed R, RHadoop and all related packages on cdh4 but I am stuck at RHive. Using cdh4…
Kumar Vaibhav
  • 2,632
  • 8
  • 32
  • 54