0

I got error :

[1] "testtt"
Error in unique.default(x, nmax = nmax) : 
  unique() applies only to vectors
Calls: as.factor -> factor -> unique -> unique.default
Execution halted


library(SparkR)

Here is my R code:

sc <- sparkR.init(appName="SparkR-W1-example")
sqlContext <- sparkRSQL.init(sc)

babiesOR <- read.df(sqlContext, "/root/Desktop/babies.csv", "com.databricks.spark.csv", header="true")
print('testtt')


localDf <- collect(babiesOR)
babies <- createDataFrame(sqlContext, localDf)

babies$bwt2 = as.factor( babies$bwt2 )  
class(babies)

How can I solve this problem?

leppie
  • 115,091
  • 17
  • 196
  • 297
user2492364
  • 6,543
  • 22
  • 77
  • 147

1 Answers1

0

There are two problems with your code: firstly factor is not a SparkR type, you're restricted to strings, secondly as.type does not work on DataFrame columns, you should use the following conversion code

babies$bwt2 <- cast(babies$bwt2,'string')
Wannes Rosiers
  • 1,680
  • 1
  • 12
  • 18
  • I thought I can use R code,but it seems like I have to learn the sparkR method.Where can I find some tutorial ? Could you tell me,thanks. – user2492364 Aug 14 '15 at 07:30
  • You can use some R code, not all, probably more will be provided. The complete documentation of SparkR 1.4.1 is here: https://spark.apache.org/docs/latest/api/R/index.html. I have no knowledge of good tutorials yet. – Wannes Rosiers Aug 14 '15 at 11:15