0

Edit: Packages used are: plyr and vegan. R is most up to date version.

My base data is this:

X1 = c('Archea01', 'Bacteria01', 'Bacteria02') 
Sample1 = c(0.2,NA,NA) 
Sample2 = c(0, 0.001, NA) 
Sample3 = c(0.04, NA, NA)
df = data.frame(X1,Sample1,Sample2,Sample3)
df
          X1 Sample1 Sample2 Sample3
1   Archea01     0.2   0.000    0.04
2 Bacteria01      NA   0.001      NA
3 Bacteria02      NA      NA      NA

Data purposefully made with NAs, to reflect real data.

My goal is to sum the frequency of bacterial/archeal occurrence in each sample, which would ideally create this type of data frame:

Sample1    Sample2    Sample3
23         11         12

I have managed to create a list of frequency:

dfFreq <- apply(df, 2, count)

Although this looks good, it's not quite what I want:

head(dfFreq)[2]
$Sample2
         x    freq
1       0.000  23
2       0.001   5
3       <NA>   50  

The next logical step would be to convert the list into a dataframe and sum frequency (or vice versa), but my code has not worked. I have tried:

 df.data <- ldply (dfFreq, data.frame)
 dfSUM <- apply(dfFreq, 2, sum)

Trying to sum the list simply hasn't worked (unsurprisingly). Regarding transforming into a dataframe, I have looked all over Stack Overflow and have seen a lot suggesting the above or lapply, but the data frame that is created from the code suggested is:

 x           freq
 Archea01    1
 Bacteria01  1
 etc         etc

Which is not what I want.

Any thoughts about how to either A) sum frequency and then convert into a data frame like the one I want, or B) convert the list into a sensible data frame whose frequency column can be summed? I think A is the only way I can get to the point I want, but any thoughts about this would be greatly appreciated.

Edit 2.0: Ryan Morton suggested the following code:

require(dplyr)
dfBound <- rbind(dfFreq)

Which has resulted in this data frame:

        X1                                  Sample1
dfFreq list(x = 1:1885, freq = c(1, 1, 1)   list(x = c(1, 2, 3)

Although this certainly seems closer to the solution, I notice that each list either follows the format of X1, or the format of Sample1 (x = c(1,2,3, etc), which indicates that something wrong happened in the process of binding the lists.

Any ideas of why this may not be working, and what solution there may be for summing the frequency found within the list?

Thanks very much.

E.O.
  • 351
  • 2
  • 14
  • 2
    I don't understand how the sample data that you provide produces the frequencies that you mention. Please elaborate or provide data / output that matches. Also, `count` is not a base R function. If you are using any packages, mention them explicitly or add their tag. – lmo Jan 24 '17 at 20:30
  • I'd rbind() the list of data frames and then sum the frequencies. Using dplyr's group_by function should work: df %>% group by(x) %>% summarise(freq = sum(freq). If you need the sample name to come through, you need to add the sample name to each data frame (and add that variable to the group_by function). – Ryan Morton Jan 24 '17 at 20:40
  • @lmo sorry about that- have the edits I made made it any clearer? – E.O. Jan 24 '17 at 22:05
  • @RyanMorton thank you very much for this. That looks like it should be exactly what I'm looking for. I'll try it out tomorrow and see if it works out. – E.O. Jan 24 '17 at 22:05
  • @RyanMorton the code hasn't worked out for me (see the edits above). Any idea as to why that might be? I'm wondering if the NAs are affecting the commands... – E.O. Jan 25 '17 at 13:40

1 Answers1

0

Update I figured out how to sum my original frequency table and convert it into the data frame I was hoping for. Thanks to Ryan Morton for pointing me in the right direction and providing code.

dfNARemoved <- lapply(dfFreq, function(x) transform(x[-nrow(x),]))#removing useless NAs in my data
dfFreqxRemoved <- lapply(dfNARemoved, function(x) { x["x"] <- NULL; x })     #removing useless x column
dfSum <- lapply(dfFreqxRemoved, function(x) sum(x))
require(dplyr)
#Now converting into a dataframe
dfBound <- rbind(dfSum)
dfData <- as.data.frame(dfBound)
E.O.
  • 351
  • 2
  • 14