I'm a bit of an R newbie but I'm trying to use the acast,ff,ffbase packages to create a pretty large R object (5 mil rows x ~17k cols) - after a SQLServer dataset has been brought in and been melted in casted into smaller dataframe chunks (can't, of course, create this object with one simple melt & dcast). All of the chunks have the same columns. When I get to the ffdfappend, R crashes - with the Windows crash (R for Windows GUI front-end has stopped working). My main question, at this point, is am I using ffdfappend correctly?
Windows Server 2008 - 64bit R 2.15.3 - 64bit
My "Chunking Up" Code:
library(RODBC);
library(reshape2);
library(ff);
library(ffbase);
db <- odbcConnect(foo);
#agg function used in the acast
iSequence <- function(x){
if(is.na(min(x))) {
return('N');
} else {
if(min(x) == 1) {
return('P');
} else {
return('S');
}
}
}
#pull in data from SS
data.raw <- sqlQuery(db,"
SELECT
key
,type
,val
FROM table
WHERE val IS NOT NULL
",stringsAsFactors=FALSE);
#melt
data.melt <- melt(data.raw,id=c("key","type"),measure="val");
#get list of unique first 4 digits of key - good enough granularity for chunk
data.char <- unique(substr(data.melt$key,1,4));
#create list where the chunked casts will reside
data.df.list <- vector("list",length(data.char));
#get list of unique column names
data.type.unique <- unique(data.melt$type);
#chunk counter
chunk.count <- 1;
#cast by chunk
for(i in data.char) {
print(paste(chunk.count, '/', length(data.char)));
tempcast <- acast(data.melt[substr(data.melt$key,1,4)==i,],key~type,fun.aggregate=iSequence);
#create list item with all N
templist <- matrix(
data="N",
nrow=nrow(tempcast),
ncol=length(data.type.unique),
dimnames=list(rownames(tempcast),data.type.unique)
);
#replace columns that are in data.type.unique but not in tempcast
templist[,which(data.type.unique %in% colnames(tempcast))] <- tempcast;
#put into final cast list
data.df.list[[i]] <- as.data.frame(templist);
rm(tempcast);
rm(templist);
gc();
chunk.count <- chunk.count + 1;
}
So now I have all the chunks, great (data looks fine, valid, etc.). When I go to test ffdfappend on two chunks (these are values work), R crashes:
#this works
#unsort is in there because I get an error saying this is sorted otherwise
t1 <- NULL;
t1 <- unsort(ffdfappend(t1,data.df.list[["1174"]]));
t2 <- NULL;
t2 <- unsort(ffdfappend(t2,data.df.list[["1175"]]));
#this crashes R
t1 <- ffdfappend(t2,t1);
Am I using ffdfappend correctly? Thanks!