Windows 10 64 bit, 32 GB RAM, Rstudio 1.1.383 and R 3.4.2 (up-to-date)
I have several csv files which have at least 1 or 2 lines full of many nul values. So I wrote a script that uses read_lines_raw() from stringr package in R which reads the file in raw format. It produces a list where each element is a row. Then I check for 00 (the nul value) and when it is found that line gets deleted.
One of the files is 2.5 GB in size and also has nul value somewhere in it. The problem is, read_lines_raw is not able to read this file and throws an error:
r in read_lines_raw_(ds, n_max = n_max, progress = progress) :
negative length vectors are not allowed
I don't even understand the problem. Some of my research hints something regarding the size, but not even half of the RAM is used. Some other files that it was able to read were 1.5 GB in size. Is this file too big, or is it something else that causes this?
Update 1:
I tried to read in the whole file using scan but that also gave me an error:
could not allocate memory (2048 Mb) in C function 'R_AllocStringBuffer'
So although my pc is 32 GB, the maximum allowed space for an entity is 2 GB? And I checked to make sure it is running 64 bit R, and yes it is.
> version
_
platform x86_64-w64-mingw32
arch x86_64
os mingw32
system x86_64, mingw32
status
major 3
minor 4.2
year 2017
month 09
day 28
svn rev 73368
language R
version.string R version 3.4.2 (2017-09-28)
nickname Short Summer
It seems like many people are facing similar issues, but there is no solution I could find. How can we increase the memory allocation for individual entities? The memory.limit() gives back 32 GB, which is the RAM size, but that isn't helpful. memory.size() does give something close 2 GB, and since the file is 2.7 GB on the disk, I assume this is the reason for getting the error.
Thank you.