0

I'm a bit of an R newbie but I'm trying to use the acast,ff,ffbase packages to create a pretty large R object (5 mil rows x ~17k cols) - after a SQLServer dataset has been brought in and been melted in casted into smaller dataframe chunks (can't, of course, create this object with one simple melt & dcast). All of the chunks have the same columns. When I get to the ffdfappend, R crashes - with the Windows crash (R for Windows GUI front-end has stopped working). My main question, at this point, is am I using ffdfappend correctly?

Windows Server 2008 - 64bit R 2.15.3 - 64bit

My "Chunking Up" Code:

library(RODBC);
library(reshape2);
library(ff);
library(ffbase);

db <- odbcConnect(foo);

#agg function used in the acast
iSequence <- function(x){
 if(is.na(min(x))) {
  return('N');
 } else {
  if(min(x) == 1) {
   return('P');
  } else {
   return('S');
  }
 }
}

#pull in data from SS
data.raw <- sqlQuery(db,"
SELECT 
key
,type
,val
FROM table
WHERE val IS NOT NULL
",stringsAsFactors=FALSE);

#melt
data.melt <- melt(data.raw,id=c("key","type"),measure="val");

#get list of unique first 4 digits of key - good enough granularity for chunk
data.char <- unique(substr(data.melt$key,1,4));

#create list where the chunked casts will reside
data.df.list <- vector("list",length(data.char));

#get list of unique column names
data.type.unique <- unique(data.melt$type);

#chunk counter
chunk.count <- 1;

#cast by chunk 
for(i in data.char) {
 print(paste(chunk.count, '/', length(data.char)));
 tempcast <- acast(data.melt[substr(data.melt$key,1,4)==i,],key~type,fun.aggregate=iSequence);
 #create list item with all N
 templist <- matrix(
  data="N",
  nrow=nrow(tempcast),
  ncol=length(data.type.unique),
  dimnames=list(rownames(tempcast),data.type.unique)
 );
 #replace columns that are in data.type.unique but not in tempcast
 templist[,which(data.type.unique %in% colnames(tempcast))] <- tempcast;
 #put into final cast list
 data.df.list[[i]] <- as.data.frame(templist);
 rm(tempcast);
 rm(templist);
 gc();
 chunk.count <- chunk.count + 1;
}

So now I have all the chunks, great (data looks fine, valid, etc.). When I go to test ffdfappend on two chunks (these are values work), R crashes:

#this works 
#unsort is in there because I get an error saying this is sorted otherwise
t1 <- NULL;
t1 <- unsort(ffdfappend(t1,data.df.list[["1174"]]));
t2 <- NULL;
t2 <- unsort(ffdfappend(t2,data.df.list[["1175"]]));

#this crashes R
t1 <- ffdfappend(t2,t1);

Am I using ffdfappend correctly? Thanks!

  • What do you mean with 'R crashes'. Does it segfault or is it something else? –  Apr 02 '13 at 20:07
  • 1
    Specifically, this: "R crashes - with the Windows crash (R for Windows GUI front-end has stopped working)" is a bit ambiguous. Could you elaborate? – joran Apr 02 '13 at 20:09
  • Sorry! R goes grey and I get a popup with header text of "R for Windows GUI front-end has stopped working" and the option of checking online for a solution and close the program, Close the program, or Debug the program (with View problem details). I'm recreating the crash right now to get the 'View problem details' (I remember it having Problem Evenet Name: APPCRASH with the version, exception code, etc.). Thanks! – user2183510 Apr 02 '13 at 22:00
  • This might be an issue in package ff which is solved in version 2.2.12 (https://r-forge.r-project.org/scm/viewvc.php/pkg/ff/NEWS?view=markup&root=ff: ff no longer segfaults when using closed ff objects, e.g. if an integer ff index was used and index or object was closed). Have you tried out version 2.2.12 of the ff package to see if this solves your issue? –  Apr 03 '13 at 09:10

0 Answers0