-2

I would like to use tapply on a list of data frames in oder to calculate sums for individual groups and then tabulate the occurence of the value 0. On an individual data frame I would do this:

sums <- tapply(my_data_frame$V3, my_data_frame$V2, sum)
table(unlist(sums==0))

Since I have to calculate this for a number of files, I have loaded them all into a list:

files <- Sys.glob("*txt")
listOfFiles <- lapply(files, function(x) read.table(x, skip = 1, sep = "\t"))
listOfFiles <- lapply(listOfFiles, function (x) na.omit(x))

I have tried this, but it does not work:

lapply(listOfFiles, tapply(
    lapply(listOfFiles, "[", c(2)),
    lapply(listOfFiles, "[", c(3)),
    sum)
)

Could someone give me any hints on what to do?

smci
  • 32,567
  • 20
  • 113
  • 146
  • 1
    Hello Mareike. Please read [How to Create a Minimal, Complete, and Verifiable Example](https://stackoverflow.com/help/mcve) and update your question. Also, please explain what constitutes a "group" in your question. – Len Greski Oct 13 '18 at 23:27
  • That's not a list of dataframes: `my_data_frame$V3, my_data_frame$V2`, it's a list of columns (or 'individual groups' as you call them). Do you want to reference the columns by name, or by index? – smci Oct 13 '18 at 23:34
  • The intent of your `lapply(listOfFiles, "[", c(2))` and `...c(3)` is simply to slice columns 2,3 from each dataframe. There are many easier ways to do that, and duplicate questions. First, what data structure do you want to use to keep multiple dataframes? list? tibble? – smci Oct 13 '18 at 23:40

1 Answers1

1

Consider building a generalized function of your read.table, tapply, and table calls and have lapply iteratively read files and calls your procedure:

proc_sums <- function(myfile) {
   # READ FILE INTO DATA FRAME
   my_data_frame <- read.table(myfile, skip = 1, sep = "\t")
   my_data_frame <- na.omit(my_data_frame)

   # RUN GROUP SUMS
   sums <- tapply(my_data_frame$V3, my_data_frame$V2, sum)
   tbl <- table(unlist(sums==0))

   return(tbl)
}

files <- Sys.glob("*txt")
# ITERATE THROUGH FILES AND CALL PROCEDURE
list_of_sum_tables <- lapply(files, proc_sums)
Parfait
  • 104,375
  • 17
  • 94
  • 125