I have a dataset as 17GB. Because of size of it, sometimes data manipulations meet the RAM shortage and brake down. I try to use arrow
as a package developed for manipulations with data over the RAM size. After reading the csv-file with read_csv_arrow
I got the 24GB used memory. In comparison, when I download the regular csv-file it require 17GB only and run much faster.
library(dplyr)
library(arrow)
Sales_Arrow <- read_csv_arrow("Budget_2023 Sales All.csv", as_data_frame = FALSE)
The Memory usage is: | Statistics | Memory | |:-------------------|------------| | Used by session | 15,288 MiB | | Used by system | 8,351 MiB | | Free system memory | 8,924 MiB |
The next step when filter data and do some manipulations in RAM takes all free memory
Current_Price <- collect(filter(Sales_Arrow, SBPHYP >= 202101)) %>%
group_by(`Cust-Item-Loc`) %>%
arrange(desc(SBINDT)) %>%
slice(1)
and doesn't perform providing the following Error message
Error: cannot allocate vector of size 64.0 Mb
Error during wrapup: cannot allocate vector of size 64.0 Mb
Error: no more error handlers available (recursive errors?); invoking 'abort' restart
Please advise