1

How do I install Rhadoop from Github using Devtools

I am basically wanting to install rhdfs from https://github.com/RevolutionAnalytics/rhdfs

but this doesnot work

I tried the following

> install_github("https://github.com/RevolutionAnalytics/rhdfs")
Error in username %||% getOption("github.user") %||% stop("Unknown username.") : 
  Unknown username.
> install_git("https://github.com/RevolutionAnalytics/rhdfs")
Downloading git repo https://github.com/RevolutionAnalytics/rhdfs
Error: Does not appear to be an R package (no DESCRIPTION)
> install_git("https://github.com/RevolutionAnalytics/rhdfs/pkg")
Downloading git repo https://github.com/RevolutionAnalytics/rhdfs/pkg
Error in git2r::clone(x$url, bundle, progress = FALSE) : 
  Error in 'git2r_clone': Unexpected HTTP status code: 404

and

> url="https://github.com/RevolutionAnalytics/rhdfs/blob/master/pkg/R/hdfs.r"
> source_url(url = url)
SHA-1 hash of file is 106c6441dcc7e8e4ee21a6dd3725ca21c4103ce7
Error in source(temp_file, ...) : 
  /tmp/RtmplCSWzE/file16aa477002e1:4:1: unexpected '<'
3: 
4: <
Ajay Ohri
  • 3,382
  • 3
  • 30
  • 60

2 Answers2

2

Two problems here:

  1. The installation instructions for RHadoop are less than clear. However, you still need to rtfm for RHadoop (e.g. https://github.com/RevolutionAnalytics/RHadoop/wiki/user%3Ermr%3EHome for setting environment variables such as HADOOP_CMD.

  2. You are using the wrong syntax for install_github(). Try:

    devtools::install_github("RevolutionAnalytics/rhdfs", subdir = "pkg")
    
Ken Benoit
  • 14,454
  • 27
  • 50
1

Courtesy of @pascal above

You can download and install Rhadoop packages from here

https://github.com/RevolutionAnalytics/RHadoop/wiki/Downloads

rhdfs has some prerequistes

https://github.com/RevolutionAnalytics/RHadoop/wiki/user%3Erhdfs%3EHome

  Prerequisites

    This package has a dependency on rJava
    Access to HDFS via this R package is dependent upon the HADOOP_CMD environment variable. HADOOP_CMD points to the full path for the hadoop binary. If this variable is not properly set, the package will fail when the init() function is invoked
    Example:

HADOOP_CMD=/usr/bin/hadoop 
Ajay Ohri
  • 3,382
  • 3
  • 30
  • 60