How to read a JSON object that I created in R into sparkR

Question

I would like to take a dataframe I've created in R, and turn that into a JSON Object, and then read that JSON object into sparkR. With my current project, I can't just pass a dataframe into SparkR, and have to do this roundabout method to get my project to work. I also can't make a local JSON file first to read into sparkR, and so I am trying to make a JSON object to hold my data, and then read that into sparkR.

In other posts I read, Scala Spark has a function

sqlContext.read.json(anotherPeopleRDD)

That seems to do what I am trying to accomplish. Is there something similar for SparkR?

Here is the code I am working with right now:

.libPaths(c(.libPaths(), '/root/Spark1.6.2/spark-1.6.2-bin-hadoop2./R/lib'))
Sys.setenv(SPARK_HOME = '/root/Spark1.6.2/spark-1.6.2-bin-hadoop2.6')
Sys.setenv(R_HOME = '/root/R-3.4.1')
.libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))
Sys.setenv("spark.r.command" = '/usr/bin')
Sys.setenv(HADOOP_CONF_DIR = "/etc/hadoop/conf.cloudera.yarn")
Sys.setenv(PATH = paste(Sys.getenv(c('PATH')), '/root/Spark1.6.2/spark1.6.2-bin-hadoop2.6/bin', sep=':'))

library(SparkR)
sparkR.stop()
sc <- sparkR.init(sparkEnvir = list(spark.shuffle.service.enabled=TRUE,spark.dynamicAllocation.enabled=TRUE, spark.dynamicAllocation.initialExecutors="2"), master = "yarn-client", appName = "SparkR")
sqlContext <- sparkRSQL.init(sc)
options(warn=-1)
n = 1000
x = data.frame(id = 1:n, val = rnorm(n))

library(RJSONIO)
exportJson <- toJSON(x)
testJsonData = read.json(sqlContext, exportJson) #fails
collect(testJsonData)

remove(sc)
remove(sqlContext)
sparkR.stop()
options(warn=0)

With the error message I'm getting for read.json:

17/08/03 12:25:35 ERROR r.RBackendHandler: json on 2 failed
Error in invokeJava(isStatic = FALSE, objId$id, methodName, ...) :java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: {

It sure sounds like a *very* convoluted way to get simple things done... — desertnaut, Aug 04 '17 at 16:42

score 0 · Answer 1 · answered Aug 08 '17 at 18:03

The solution to this problem was that the JSON files I was working with was not supported by the spark read.json function, due to how it was formated. Instead I had to use another R library, jsonlite, to make my JSON files, and now it works as intended. This is how it looks like when I create the file now:

library(jsonlite)
exportJson <- toJSON(x)
testJsonData = read.json(sqlContext, exportJson) #fails
collect(testJsonData)

I hope that helps anyone!

How to read a JSON object that I created in R into sparkR

1 Answers1