I am trying to change all the column names of the data whose class is tbl_spark

Question

Here is the code:

    library(sparklyr)
    sc <- spark_connect(master = "local", config = list())
    iris_tbl <- copy_to(sc, iris, overwrite = T)
    newColList <- c("a", "b" , "c" , "d" , " e")
    colnames(iris_tbl) <- newColList

Error:

Error in colnames<- ( tmp, value = c("a", "b", "c", "d", " e")) : 'dimnames' applied to non-array

score 0 · Answer 1 · answered May 01 '17 at 19:36

0

names(iris_tbl) <- newColList works but I think a better answer would utilize %>% and dplyr::rename

answered May 01 '17 at 19:36

kevinykuo

4,600
5
23
31

names(iris_tbl) <- newColList is throwing error: Error: org.apache.spark.sql.AnalysisException: cannot resolve 'e' given input columns: [b, d, e, a, c]; at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:60) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:57) – Priyanka May 02 '17 at 10:29

score 0 · Answer 2 · answered May 17 '17 at 20:54

I've been searching around for this all day. Right now my best solution is to create a custom function that goes direct to the Spark API:

sdf_write_colnames <- function(in_tbl, new_names) {

  sdf_name <- as.character(in_tbl$ops$x)

  in_tbl %>%
    spark_dataframe() %>%
    invoke("toDF", as.list(new_names)) %>%
    sdf_register(name = sdf_name)
}

iris_tbl <- sdf_write_colnames(iris_tbl, c("a", "b", "c", "d", "e"))

head(iris_tbl)

With a bit of effort it could be made to work more like colnames() <-

I'll leave this up in case it's of use, but I've had a few problems with this. Not sure all the registering should be necessary. — dougmet, May 19 '17 at 21:56

I am trying to change all the column names of the data whose class is tbl_spark

2 Answers2