Highest Voted 'disk.frame' Questions

0

votes

0 answers

Unexpected symbol when concatenate strings in R

I met the following problem. My dataset "Sales" store as disk.frame. There are two character variables "Item-Entity" and "SBLOC". I want to create another variable concatenated these variables: Sales <- as.disk.frame(Sales) %>% mutate("Item-Loc" =…

r disk.frame

asked Oct 13 '22 at 19:28

grislepak

31
3

0

votes

0 answers

how many data transformation can i perform in disk.frame r

I have a dataset about 16GB. To reduce the RAM usage I transformed it into disk.frame After few manipulations - just mutate 10 variables I tried to move new table to RAM using collect function. The error message is the following Error:…

r collect disk.frame

asked Oct 13 '22 at 17:27

grislepak

31
3

0

votes

1 answer

Remove duplicate rows in a diskframe object

I have a diskframe object with many duplicate rows. How could I remove them? (The original dataframe is 10 Gb size)

r duplicates disk.frame

asked Jul 04 '22 at 14:14

Irene M

11

0

votes

1 answer

How to import the data in disk.frame folder into R environment

There is a folder 'C:\tmp_flights.df' that created by disk.frame package , how to import the data into R environment again ? Thanks! Below code created the disc.frame folder library(disk.frame) library(nycflights13) library(tidyverse) …

r disk.frame

asked Mar 28 '22 at 09:57

anderwyang

1,801
4
18

0

votes

1 answer

Using disk.frame, but still reaching memory limit issue

Problem: I am trying to perform a correlation test on a large dataset: the data.table can exist in memory, but operating on it with Hmisc::rcorr() or corrr::correlate() eventually runs into the memory limit. > Error: cannot allocate vector of size…

r correlation hmisc r-bigmemory disk.frame

asked Nov 11 '21 at 12:52

Buzz B

75
7

0

votes

0 answers

Still getting cannot allocate vector of size issues despite using Disk Frame in R

I've been trying to work with disk frame to load up a file that's about 45 gbs. I have used the code below to convert the csv to a disk frame: output_path = file.path(tempdir(), "tmp_cars.df") disk <- csv_to_disk.frame("full-drivers.csv", outdir =…

r csv disk.frame

asked Jul 23 '21 at 16:30

Shazzzam

1
2

0

votes

0 answers

Summary statistics on out-of-memory file

I have a csv file that's 120GB in size which is a set of numerical values grouped by categorical variables. eg. df<-as.data.frame(x=rbing(rep("BLO",100),rep("LR",100)), y=runif(200)) I would like to calculate some summary statistics using…

r fst disk.frame

asked Jan 15 '21 at 20:38

HCAI

2,213
8
33
65

0

votes

0 answers

How to store where a passenger gets on and off a train whilst minimising size of file for plotting?

I have 500GB of .csv data which include these three (and other) variables: 1. where a passenger gets on a train, 2. where they get off and 3. The time it takes. I need to make box plots of the time it takes based on where they got on and where they…

r data.table disk.frame

asked Jan 12 '21 at 12:52

HCAI

2,213
8
33
65

0

votes

1 answer

columns jumbled after using csv_to_disk.frame

i have around 15 GB of zipped data in 30 minute packages. unzipping and reading them with either unzip and readr or fread works just fine but the ram-requirements don't allow me to read in as many files as i wish. so i've tried to use the disk.frame…

r disk.frame

asked Oct 03 '20 at 12:39

D.J

1,180
1
8
17

0

votes

1 answer

How should we choose the compression rate with rbindlist.disk.frame?

It's set to 50 by default on a scale of 1 to 100. I have an especially large disk frame and I'm considering using a high number. What are the important trade-offs to consider?

r disk.frame

asked Sep 19 '20 at 01:37

Cauder

2,157
4
30
69

0

votes

2 answers

My group by doesn't appear to be working in disk frames

I ran a group by on a large dataset (>20GB) and it doesn't appear to be working quite right This is my code mydf[, .(value = n_distinct(list_of_id, na.rm = T)), by = .(week), keep = c("list_of_id",…

r data.table disk.frame

asked Sep 11 '20 at 17:31

Cauder

2,157
4
30
69

0

votes

1 answer

How does srckeep affect the underlying disk frame?

I have a disk frame with these columns key_a key_b key_c value Say the disk frame is 200M rows and I'd like to group it by key_b. Additionally, I want to keep the underlying disk frame in tact and unchanged so I could later on join it to something…

r disk.frame

asked Sep 11 '20 at 15:56

Cauder

2,157
4
30
69

0

votes

1 answer

How do I bind two disk frames together?

I have two disk frame and each are about 20GB worth of files. It's too big to merge as data tables because the process requires more than the memory I have available. I tried using this code: output <- rbindlist(list(df1, df2)) The wrinkle is that…

r data.table disk.frame

asked Sep 11 '20 at 04:39

Cauder

2,157
4
30
69

0

votes

1 answer

How do I find out how many workers is my disk.frame using?

I am using the disk.frame package and I wanted to know how many workers is disk.frame using to perform the operations? I looked through disk.frame documentation and can't find such a function.

disk.frame

asked Aug 23 '19 at 03:29

xiaodai

14,889
18
76
140

-1

votes

2 answers

Does disk.frame allow to work with large lists in R?

I am producing a very big datasets (>120 Gb), which are actually a list of named (100x100x3) matrices. A very large lists (millions of records). They are then fed to CNN and classified to one of 4 categories. Processing this amount of data at once…

r list disk.frame

asked Jan 18 '22 at 14:44

ramen

691
4
20

Questions tagged [disk.frame]