7

In R I have a spark connection and a DataFrame as ddf.

library(sparklyr)
library(tidyverse)
sc <- spark_connect(master = "foo", version = "2.0.2")
ddf <- spark_read_parquet(sc, name='test', path="hdfs://localhost:9001/foo_parquet")

Since it's not a whole lot of rows I'd like to pull this into memory to apply some machine learning magic. However, it seems that certain rows cannot be collected.

df <- ddf %>% head %>% collect # works fine
df <- ddf %>% collect # doesn't work

The second line of code throws a Error in rawToChar(raw) : embedded nul in string: error. The column/row it fails on has some string data. Since head %>% collect works indicates that some rows seem to fail while others work as expected.

How can I work around this error, is there a way to clean up the error? What does the error actually mean?

zero323
  • 322,348
  • 103
  • 959
  • 935
Tim
  • 2,000
  • 4
  • 27
  • 45
  • 1
    What are the data types of the columns in Spark? And can you provide sample data? – Wil Aug 23 '20 at 12:41

0 Answers0