0

I have a problem that is very very similar to this: Populating a data frame in R in a loop. I cannot seem to populate my matrix in a loop.

myDF <- read.csv('corpusFiltered.txt.gz', header = TRUE, sep = '\t')
phylum = sort(unique(myDF$PHYLUM))
myDF.mean = ddply(myDF, .(ENVIRONMENT, FILENAME, PHYLUM), summarize, MeanX = mean(X, na.rm=TRUE) ) 

df_all = myDF.mean[c(4, 3)] #select only the X and Phylum
c_all = unstack(df_all) #restructure dataframe

columnPhylum1 = matrix(ncol=1, nrow=length(phylum))

GET_X = function(dataset)
{
   for (i in 1:length(phylum))
   {   
      print(phylum[i]) 
      columnPhylum1[i,] <- phylum[i] #this does not populate the matrix. still 'NA'
   }   
}
GET_X(c_all)
print('')
print(columnPhylum1)

This does not work. The output is:

[1] Actinobacteria
Levels: Actinobacteria Bacteroidetes Chlamydiae Crenarchaeota Deinococcus-Thermus Euryarchaeota Firmicutes Proteobacteria Spirochaetes Tenericutes ***
[1] Bacteroidetes
[1] Chlamydiae
[1] Crenarchaeota
[1] Deinococcus-Thermus
[1] Euryarchaeota
[1] Firmicutes
[1] Proteobacteria
[1] Spirochaetes
[1] Tenericutes
[1] ""
      [,1]
 [1,]   NA  
 [2,]   NA  
 [3,]   NA  
 [4,]   NA  
 [5,]   NA  
 [6,]   NA  
 [7,]   NA  
 [8,]   NA  
 [9,]   NA  
[10,]   NA 

***For the purpose of brevity, I removed subsequence "Levels" info from all but the first prokaryote (Actinobacteria).

However, if I make a faux matrix...

sig= matrix(ncol=1, nrow=length(phylum))
for (i in 1:length(phylum)){sig[i,]<-i}
print(sig)

This works like a charm.

     [,1]
 [1,]    1   
 [2,]    2   
 [3,]    3   
 [4,]    4   
 [5,]    5   
 [6,]    6   
 [7,]    7   
 [8,]    8   
 [9,]    9   
[10,]   10  

Perhaps I cannot see the forest for the tree; I have checked for obvious things (e.g. correct variable names) and I have been unable to find any problems. The only difference I can see is that the top calls the loop from a function. I don't understand why I am getting different behaviors from 'identical' code. Any assistance is greatly appreciated.

Community
  • 1
  • 1
cer
  • 1,961
  • 2
  • 17
  • 26

2 Answers2

2

I guess the problem is that you are trying to copy values of a factor variable in a numeric matrix (and this is why your "faux" matrix does work, because it is inserting numbers, not factors). I really don't get why you need it in a matrix, as you try to copy the entire vector that you already have. In any case, maybe this simple code is what you are really trying to do:

vect <- factor(c('foo', 'bar', 'foo'), levels = c('foo', 'bar'))
mat <- as.matrix(vect)
mat

which outputs:

    [,1] 
[1,] "foo"
[2,] "bar"
[3,] "foo"

Edit: In your specific case, this would translate into:

columnPhylum1 <- as.matrix(phylum)
  • You are correct about the code; I want to do something else & this is a way to verify the process. You are correct about why the faux matrix works. Let me try my function with the variables I actually want to capture. If it works, I will accept your answer. – cer Nov 29 '14 at 18:30
  • Francisco --> Your answer does not quite solve my problem, but you have led me closer to the source of the issue and closer to asking a better question (and hopefully a solution in the future). For that reason, I am going to accept your answer. Thank you. – cer Nov 29 '14 at 20:22
0

This may relate to your phyla vector containing factors.

Try coercing phyla into a character vector:

char_phyla <- as.character(phyla)

If that works, you can read the csv data without it auto-factoring:

myDF <- read.csv('corpusFiltered.txt.gz', header=T, sep='\t', stringsAsFactors=F)
Minnow
  • 1,733
  • 2
  • 26
  • 52