3

Windows 10 64 bit, 32 GB RAM, Rstudio 1.1.383 and R 3.4.2 (up-to-date)

I have several csv files which have at least 1 or 2 lines full of many nul values. So I wrote a script that uses read_lines_raw() from stringr package in R which reads the file in raw format. It produces a list where each element is a row. Then I check for 00 (the nul value) and when it is found that line gets deleted.

One of the files is 2.5 GB in size and also has nul value somewhere in it. The problem is, read_lines_raw is not able to read this file and throws an error:

r in read_lines_raw_(ds, n_max = n_max, progress = progress) : 
  negative length vectors are not allowed

I don't even understand the problem. Some of my research hints something regarding the size, but not even half of the RAM is used. Some other files that it was able to read were 1.5 GB in size. Is this file too big, or is it something else that causes this?

Update 1:

I tried to read in the whole file using scan but that also gave me an error:

could not allocate memory (2048 Mb) in C function 'R_AllocStringBuffer'

So although my pc is 32 GB, the maximum allowed space for an entity is 2 GB? And I checked to make sure it is running 64 bit R, and yes it is.

> version
               _                           
platform       x86_64-w64-mingw32          
arch           x86_64                      
os             mingw32                     
system         x86_64, mingw32             
status                                     
major          3                           
minor          4.2                         
year           2017                        
month          09                          
day            28                          
svn rev        73368                       
language       R                           
version.string R version 3.4.2 (2017-09-28)
nickname       Short Summer

It seems like many people are facing similar issues, but there is no solution I could find. How can we increase the memory allocation for individual entities? The memory.limit() gives back 32 GB, which is the RAM size, but that isn't helpful. memory.size() does give something close 2 GB, and since the file is 2.7 GB on the disk, I assume this is the reason for getting the error.

Thank you.

ilyas
  • 609
  • 9
  • 25

0 Answers0