I have an R data frame that I would like to convert into a Spark data frame on a remote cluster. I have decided to write my data frame to an intermediate csv file that is then read using sparklyr::spark_read_csv()
. I am doing this as the data frame is too big to send directly using sparklyr::sdf_copy_to()
(which I think is due to a limitation in Livy).
I would like to programatically transfer R column types used in the data frame to the new spark data frame by writing a function that returns a named vector that I can use with the columns
argument in spark_read_csv()
.