I am using windows 7 with a 32-bit operating system with 4Gb RAM of which only 3Gb is accessible due to 32-bit limitations. I shut everything else down and can see that I have about 1Gb as cached and 1Gb available before starting. The "free" memory varies but is sometimes 0.
Using quanteda - I am reading a twitter.txt file using the textfile() command which successfully creates a 157Mb corpusSource object. When I take the next step to convert it to a "corpus" using the corpus() command R blasts through it and creates a very small empty file with four elements all containing 0's..... Code and output follows:
twitterfile <- "./final/en_US/en_US.twitter.txt"
precorp <- textfile(twitterfile)
corp <- corpus(twitterprecorp)
summary(corp)
Corpus consisting of 1 document.
Text Types Tokens Sentences
en_US.twitter.txt 0 0 0
Source: C:/R_Data/Capstone/* on x86 by xxxxx
Created: Thu Aug 18 06:32:01 2016
Notes:
Warning message:
In nsentence.character(object, ...) :
nsentence() does not correctly count sentences in all lower-cased text
….Any insights on why this may be happening?