Questions tagged [rhadoop]

RHadoop is combination of R and Hadoop to manage and analyze data with Hadoop

RHadoop is a collection of three R packages that allow users to manage and analyze data with Hadoop. The packages have been implemented and tested in Cloudera's distribution of Hadoop (CDH3) & (CDH4). and R 2.15.0. THe packages have also been tested with Revolution R 4.3, 5.0, and 6.0. For rmr see Compatibility.

Source: Github: Revolution Analytics (RHadoop)

112 questions
2
votes
1 answer

rmr2 is duplicating the keys from my mapper

For some reason rmr2 seems to be improperly processing keys in certain circumstances, duplicating the key for each value. I am using R version 3.1.1, the 64-bit version, under Windows 7. My rmr version is rmr2_2.3.0. I am using the local mode by…
tlarchuk
  • 125
  • 1
  • 7
2
votes
4 answers

R + Hadoop with RHadoop job fails on Single Machine Cluster

Apologies in advance for being a newbie and perhaps asking stupid questions. I have installed Hadoop on a Single Machine Cluster (Ubuntu 14.04) and successfully tested the very basic program specified in the Apache installation guide. Subsequently I…
Calcutta
  • 1,021
  • 3
  • 16
  • 36
2
votes
2 answers

debugging mapreduce() function in R

Today I started working on rhdfs and rmr2 packages. mapreduce() function on a 1D vector worked well as expected. piece of code on 1D vector a1 <- to.dfs(1:20) a2 <- mapreduce(input=a1, map=function(k,v) keyval(v, v^2)) a3 <-…
Kumar
  • 314
  • 3
  • 5
  • 16
2
votes
1 answer

copying local folder to hdfs through R

I am trying to export a folder from my local file system to hdfs . I am running code through R . How may I be able to do it? Hope for suggestions
arsalan.jawed
  • 73
  • 1
  • 1
  • 6
2
votes
1 answer

Install RHadoop on 32-bit Ubuntu

Objective: To install RHadoop on single system(not VM version) System Specification: 32 bit processor, 2GB RAM, Windows 7 & Ubuntu 12.10 Explanation: I am trying to run Hadoop with R using RHadoop library. Since my system RAM is less so if I try…
Ankit
  • 359
  • 4
  • 19
1
vote
0 answers

Executing parallel function calls in R

Currently I am using foreach loop in R to run parallel function calls on multiple cores of the same machine, and the code looks something like this: result=foreach(i=1:length(list_of_dataframes)) { temp=some_function(list_of_dataframes[[i]]) …
1
vote
1 answer

what if data to big for 1 reducer (RHadoop)?

i'm new to big data and hadoop thing. I try to find median with mapreduce. From what i know, mapper pass data to 1 reducer then 1 reducer sort and find the middle value using median() function. R running in memmory, so what if data too big to store…
1
vote
0 answers

How to read parquet files from HDFS in R

I need to read parquet files stored on HDFS (I have a Kerberos-protected Hadoop cluster) in my R program. I came across a couple of packages, but none of them completely satisfy what I need rhadoop: It looks like an old project with no further…
HHH
  • 6,085
  • 20
  • 92
  • 164
1
vote
0 answers

R fatal error while doing grep and simple rhdfs command

I'm trying to break some log files using R through 'rhdfs' and 'rmr2' packages. the source is in local Linux directory in cloud and the destination folder where I'm tried to store the file is in HDFS cluster. The code was working fine until last…
1
vote
1 answer

hadoop streaming failed with error code 1 in RHadoop

I am working with RHadoop by the following…
1
vote
0 answers

how to compute mutual information MapReduce based in R??

I want to compute mutual information for all x, y in features : I(x , y) So I need to compute P(x) P(y) and P(x, Y) in Data for example: X Y - - yes 2 no 2 yes 2 no 1 yes 1 p(yes)=3/5 p(2)=3/5 p(yes,2)=2/5 counting in…
1
vote
1 answer

Why am I unable to install the R package stringi?

Problem installing stringi package during R library installation. During the installation of the package, I get an error when I connect to the URL and receive "icudt551.zip". However, the current situation is that if you have the file "icudt551.zip"…
1
vote
0 answers

Im getting NULL for both key and value in rhadoop code

train.mr <-mapreduce ( train.hdfs, map = function (k, v) { keyval (k, v$item) }, reduce = function (k, v) { m <-merge (v, v) keyval (m$x, m$y) } ) from.dfs(train.mr) If I try to execute above type of code, I will get…
Ranjitha
  • 11
  • 2
1
vote
0 answers

function with dot(.) and comma(,) put together

map_wc <- function(.,lines) { lines_lst = unlist(strsplit(lines,"\r\n",fixed=TRUE)) l_cnt<-1; keys_l<-c() …
linkonabe
  • 661
  • 7
  • 23
1
vote
0 answers

hadoop streaming failed with error code 1 in rstudio-server

I use single node. I installed rmr2 and hdfs in sudo R I wrote some codes in rstudio-server. But, it occurred error. I don`t know what's wrong. Thanks for reading. If someone help me, I will appreciate you. > library("rmr2",…
yes89929
  • 319
  • 1
  • 4
  • 11