3

Having Spark-1.5.0 installed on my Mac machine, I'm trying to initialise spark context with com.databricks:-csv_2.11:1.2.0 package in rStudio, as:

Sys.setenv('SPARKR_SUBMIT_ARGS'='"--packages" "com.databricks:-csv_2.11:1.2.0" "sparkr-shell"')
library(SparkR, lib.loc = "spark-1.5.0-bin-hadoop2.6/R/lib/")
sc <- sparkR.init(sparkHome = "spark-1.5.0-bin-hadoop2.6/")

But I'm getting the following error message:

[unresolved dependency: com.springml#spark-salesforce_2.10;1.0.1: not found]

Why does that happen?

P.s., the initiliztation works fine when I use com.databricks:spark-csv_2.10:1.0.3.

UPDATE

I tried to use the version com.databricks:spark-csv_2.10:1.2.0 and things work fine.

Now, I use this code in rStudio to load a csv file:

sqlContext <- sparkRSQL.init(sc)
flights <- read.df(sqlContext, "R/nycflights13.csv", "com.databricks.spark.csv", header="true")

I get the following error message:

Error in writeJobj(con, object) : invalid jobj 1

When I execute sqlContext I get the error:

Error in callJMethod(x, "getClass") : 
  Invalid jobj 1. If SparkR was restarted, Spark operations need to be re-executed.

Session info:

R version 3.2.0 (2015-04-16)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.10.2 (Yosemite)

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] SparkR_1.5.0 rJava_0.9-7 

loaded via a namespace (and not attached):
[1] tools_3.2.0

Note that I don't get this error when I use Spark Shell with the same commands.

Stan
  • 1,042
  • 2
  • 13
  • 29
  • 1
    See my answer here: http://stackoverflow.com/q/32873434/1560062, 2.11 is Scala version and bre-built binaries are using Scala 2.10. If you want to use 2.11 you have to [build Spark from source with Scala 2.11](http://spark.apache.org/docs/latest/building-spark.html#building-for-scala-211). Moreover you have typo in the package name - it should be `com.databricks:spark-csv_2.11:1.2.0`. If you simply want to use a recent version of `spark-csv` on 2.10 use `com.databricks:spark-csv_2.10:1.2.0`. – zero323 Oct 01 '15 at 14:40
  • Thanks @zero323. Just tried to use com.databricks:spark-csv_2.10:1.2.0 , but I had the following error when I uploaded a csv file: Error in writeJobj(con, object) : invalid jobj 1 – Stan Oct 01 '15 at 15:18
  • The error "Error in writeJobj(con, object) : invalid jobj 1" does not occur when I use Spark shell instead of rStudio? any idea why? – Stan Oct 01 '15 at 15:22
  • 1
    Not really. I cannot reproduce this problem. Could you provide more details (OS, R version, RStudio version, session info)? – zero323 Oct 01 '15 at 15:26
  • @zero323 I updated the question with more info – Stan Oct 01 '15 at 15:34

1 Answers1

1

Problem sovled.

Everything is working now after restarting the Rsession and using the following code:

Sys.setenv('SPARKR_SUBMIT_ARGS'='"--packages" "com.databricks:spark-csv_2.10:1.2.0" "sparkr-shell"')
library(rJava)
library(SparkR, lib.loc = "spark-1.5.0-bin-hadoop2.6/R/lib/")

sc <- sparkR.init(master = "local", sparkHome = "spark-1.5.0-bin-hadoop2.6")

sqlContext <- sparkRSQL.init(sc)

flights <- read.df(sqlContext, "R/nycflights13.csv", "com.databricks.spark.csv", header="true")
Stan
  • 1,042
  • 2
  • 13
  • 29