When using the function chunked::read_csv_chunked
and dplyr::filter
in a pipe, I get an error every time the filter returns an empty dataset on any of the chunks. In other words, this occurs when all the rows from a given chunk of the dataset are filtered out.
Here is a modified example, drawn from the package chunked help file:
library(chunked); library(dplyr)
# create csv file for demo purpose
in_file <- file.path(tempdir(), "in.csv")
write.csv(women, in_file, row.names = FALSE, quote = FALSE)
# reading chunkwise and filtering
women_chunked <-
read_chunkwise(in_file, chunk_size = 3) %>% #read only a few lines for the purpose of this example
filter(height > 150) # This basically filters out most lines of the dataset,
# so for instance the first chunk (first 3 rows) should return an empty table
# Trying to read the output returns an error message
women_chunked
# >Error in UseMethod("groups") :
# >no applicable method for 'groups' applied to an object of class "NULL"
# As does of course trying to write the output to a file
out_file <- file.path(tempdir(), "processed.csv")
women_chunked %>%
write_chunkwise(file=out_file)
# >Error in read.table(con, nrows = nrows, sep = sep, dec = dec, header = header, :
# >first five rows are empty: giving up
I am working on many csv files, each 50 millions rows, and will thus often end up in a similar situation where the filtering returns (at least for some chunks) an empty table.
I coudn't find a solution or any post related to on this problem. Any suggestions? I do not think the sessionInfo output is useful in this case, but please let me know if I should post it anyway. Thanks a lot for any help!