Questions tagged [disk.frame]
30 questions
0
votes
0 answers
Unexpected symbol when concatenate strings in R
I met the following problem.
My dataset "Sales" store as disk.frame.
There are two character variables "Item-Entity" and "SBLOC". I want to create another variable concatenated these variables:
Sales <- as.disk.frame(Sales) %>%
mutate("Item-Loc" =…

grislepak
- 31
- 3
0
votes
0 answers
how many data transformation can i perform in disk.frame r
I have a dataset about 16GB. To reduce the RAM usage I transformed it into disk.frame
After few manipulations - just mutate 10 variables I tried to move new table to RAM using collect function.
The error message is the following
Error:…

grislepak
- 31
- 3
0
votes
1 answer
Remove duplicate rows in a diskframe object
I have a diskframe object with many duplicate rows.
How could I remove them?
(The original dataframe is 10 Gb size)

Irene M
- 11
0
votes
1 answer
How to import the data in disk.frame folder into R environment
There is a folder 'C:\tmp_flights.df' that created by disk.frame package , how to import the data into R environment again ? Thanks!
Below code created the disc.frame folder
library(disk.frame)
library(nycflights13)
library(tidyverse)
…

anderwyang
- 1,801
- 4
- 18
0
votes
1 answer
Using disk.frame, but still reaching memory limit issue
Problem:
I am trying to perform a correlation test on a large dataset: the data.table can exist in memory, but operating on it with Hmisc::rcorr() or corrr::correlate() eventually runs into the memory limit.
> Error: cannot allocate vector of size…

Buzz B
- 75
- 7
0
votes
0 answers
Still getting cannot allocate vector of size issues despite using Disk Frame in R
I've been trying to work with disk frame to load up a file that's about 45 gbs. I have used the code below to convert the csv to a disk frame:
output_path = file.path(tempdir(), "tmp_cars.df")
disk <- csv_to_disk.frame("full-drivers.csv", outdir =…

Shazzzam
- 1
- 2
0
votes
0 answers
Summary statistics on out-of-memory file
I have a csv file that's 120GB in size which is a set of numerical values grouped by categorical variables.
eg.
df<-as.data.frame(x=rbing(rep("BLO",100),rep("LR",100)), y=runif(200))
I would like to calculate some summary statistics using…

HCAI
- 2,213
- 8
- 33
- 65
0
votes
0 answers
How to store where a passenger gets on and off a train whilst minimising size of file for plotting?
I have 500GB of .csv data which include these three (and other) variables: 1. where a passenger gets on a train, 2. where they get off and 3. The time it takes.
I need to make box plots of the time it takes based on where they got on and where they…

HCAI
- 2,213
- 8
- 33
- 65
0
votes
1 answer
columns jumbled after using csv_to_disk.frame
i have around 15 GB of zipped data in 30 minute packages. unzipping and reading them with either unzip and readr or fread works just fine but the ram-requirements don't allow me to read in as many files as i wish. so i've tried to use the disk.frame…

D.J
- 1,180
- 1
- 8
- 17
0
votes
1 answer
How should we choose the compression rate with rbindlist.disk.frame?
It's set to 50 by default on a scale of 1 to 100.
I have an especially large disk frame and I'm considering using a high number.
What are the important trade-offs to consider?

Cauder
- 2,157
- 4
- 30
- 69
0
votes
2 answers
My group by doesn't appear to be working in disk frames
I ran a group by on a large dataset (>20GB) and it doesn't appear to be working quite right
This is my code
mydf[, .(value = n_distinct(list_of_id, na.rm = T)),
by = .(week),
keep = c("list_of_id",…

Cauder
- 2,157
- 4
- 30
- 69
0
votes
1 answer
How does srckeep affect the underlying disk frame?
I have a disk frame with these columns
key_a
key_b
key_c
value
Say the disk frame is 200M rows and I'd like to group it by key_b. Additionally, I want to keep the underlying disk frame in tact and unchanged so I could later on join it to something…

Cauder
- 2,157
- 4
- 30
- 69
0
votes
1 answer
How do I bind two disk frames together?
I have two disk frame and each are about 20GB worth of files.
It's too big to merge as data tables because the process requires more than the memory I have available. I tried using this code: output <- rbindlist(list(df1, df2))
The wrinkle is that…

Cauder
- 2,157
- 4
- 30
- 69
0
votes
1 answer
How do I find out how many workers is my disk.frame using?
I am using the disk.frame package and I wanted to know how many workers is disk.frame using to perform the operations? I looked through disk.frame documentation and can't find such a function.

xiaodai
- 14,889
- 18
- 76
- 140
-1
votes
2 answers
Does disk.frame allow to work with large lists in R?
I am producing a very big datasets (>120 Gb), which are actually a list of named (100x100x3) matrices. A very large lists (millions of records). They are then fed to CNN and classified to one of 4 categories. Processing this amount of data at once…

ramen
- 691
- 4
- 20