Questions tagged [rhadoop]

RHadoop is combination of R and Hadoop to manage and analyze data with Hadoop

RHadoop is a collection of three R packages that allow users to manage and analyze data with Hadoop. The packages have been implemented and tested in Cloudera's distribution of Hadoop (CDH3) & (CDH4). and R 2.15.0. THe packages have also been tested with Revolution R 4.3, 5.0, and 6.0. For rmr see Compatibility.

Source: Github: Revolution Analytics (RHadoop)

112 questions
0
votes
0 answers

How to retrieve TBs of data from HDFS using Rhdfs package?

How to retrieve TBs of data from HDFS using Rhdfs package because data is stored on multiple machines and R runs on single machine. How this much much amount of data is stored in R dataframe on a single system. If so, how can that huge data is…
Prabhat Jain
  • 321
  • 1
  • 4
  • 9
0
votes
1 answer

Accessing RStudio server on Cloudera VM running on Ubuntu Host

What I would like to do: Access RStudio WebGUI, running on a Cloudera Quickstart VM on an Ubuntu Host from a OSX Browser. This is what works: 1. Ubuntu host is running Cloudera Quickstart VM 2. Cloudera VM has R and RStudio Server installed and…
0
votes
1 answer

RHadoop vs. Apache Mahout

I wanna start to develop a recommendation system for big data, say 2GB log data per day. For this purpose, between Rhadoop and Apache Mahout, which one is preferred? Please answer this question from different aspects, such as availability of codes,…
0
votes
2 answers

Manipulate data set column in r hadoop

I have a data set which have a date (1/10/2015, 1/10/2016, 1/10/2017). I want to change it's format like this (2015, 2016, 2017). I need to do this using Hadoop.
0
votes
1 answer

Installing RHadoop on a Hadoop Cluster

I am trying to install RHadoop on top of my Hadoop cluster. While installing some of the required packages I am facing the following error: > install.packages("Megh/rmr2_3.3.1.tar.gz") Installing package into ‘/usr/lib64/R/library’ (as ‘lib’ is…
Megh Vidani
  • 635
  • 1
  • 7
  • 22
0
votes
1 answer

R installation on Hadoop Cluster

I'm setting up R on existing Hadoop cluster. I've so far installed R rpms and related library packages on one of the node (EDGE node) part of cluster and it works as expected. Do R rpms be installed on all servers part of cluster or just the…
0
votes
1 answer

Not getting correct result in RHadoop MAP function

Below is my text file content: name , tag/tags , location, id xyz, abc;nhj;xygf;xyz;ajsd, jhdwegyugagdwg, T1 xasdiaos, abcd, jhdwegyugagdwg0 , T3 xyzasihd, jsdh;sdgwyi, …
KrunalParmar
  • 1,062
  • 2
  • 18
  • 31
0
votes
1 answer

Is there any methods in R-Hadoop mapreduce, similar to setup() and cleanup() in Java mapreduce?

Is there any methods in R-Hadoop mapreduce, similar to setup() and cleanup() in Java mapreduce? I have to run a part of code such as a db call only once, before start of all reducers. Is there any provisions to do that when writing mapreduce code…
Naaz
  • 26
  • 2
0
votes
1 answer

Does installation of rhdfs and rmr2 in master alone is sufficient, or those libraries needs to be installed in slaves as well?

When using rhadoop, a set of packages for using R along with hadoop, Do I need to isntall the packages and Rscript in all the nodes seaparately, or else I just need to isntall it on the master machine?
Amrith Krishna
  • 2,768
  • 3
  • 31
  • 65
0
votes
1 answer

Rhadoop - wordcount using rmr

I am trying to run a simple rmr job using Rhadoop package but it is not working.Here is my R script print("Initializing…
Shashi
  • 2,686
  • 7
  • 35
  • 67
0
votes
2 answers

where does region server present and other?

Where does region servers present? Is it with data nodes or else the region servers, regions are present in different hardwares. Does WAL consists of data of a table along with operation? What does memstore does? It stored data of WAL means along…
koushik veldanda
  • 1,079
  • 10
  • 23
0
votes
1 answer

Hadoop streaming job fails with missing options error on using rmr package with R

I am trying to write a data frame from R to HDFS using rmr package in Rstudio on Amazon EMR. The tutorial I am following…
Kristy
  • 121
  • 2
  • 6
0
votes
2 answers

R Converting large CSV files to HDFS

I am currently using R to carry out analysis. I have a large number of CSV files all with the same headers that I would like to process using R. I had originally read each files sequentially into R and row binded them together before carrying out…
h.l.m
  • 13,015
  • 22
  • 82
  • 169
0
votes
1 answer

How to modify R program to support RHadoop

I am new to RHadoop and R. I am having a normal R program which has a library(Methylkit). I am wondering can someone give some insights on how do I run this R program on hadoop. What do I need to modify in the original R program? It would be really…
user4479371
0
votes
2 answers

hadoop streaming failed with error code 5

RHadoop program for wordcount: …