I am using tm and Snowball packages in R for text mining. I initially ran it on my laptop that has Windows 7 with 8 GB memory. Later I tried the same on a Linux (Ubuntu) machine with 64 GB of memory. Both of these machines are 64 bit and am using 64 bit version of R as well. However, Windows has R 3.0.0 whereas Linux has R 2.14
Some of the commands are extremely slow in Linux when compared to Windows.
Corpus Command
On windows
d <- data.frame(chatTranscripts$chatConcat)
ds <- DataframeSource(d)
t1 <- Sys.time()
dsc<-Corpus(ds)
print(Sys.time() - t1)
Time difference of 46.86169 secs
This took only 47 secs on the Windows machine
On Linux
t1 <- Sys.time()
dsc<-Corpus(ds)
print(Sys.time() - t1)
Time difference of 3.674376 mins
This took around 220 secs on the Linux machine
Snowball Stemming
On windows
t1 <- Sys.time()
dsc <- tm_map(dsc,stemDocument)
print(Sys.time() - t1)
Time difference of 12.05321 secs
This took only 12 secs on the Windows machine
On Linux
t1 <- Sys.time()
dsc <- tm_map(dsc,stemDocument)
print(Sys.time() - t1)
Time difference of 4.832964 mins
This took around 290 secs on the Linux machine
Is there a way to speed these commands on the Linux machine? Will the R versions make such a big difference. Thank you.
Ravi