0

I am trying to write some kind of loop function that will allow me to apply the same set of code to dozens of data frames that are stored in one list. Each data frame has the same number of columns and identical headers for each column, though the number of rows varies across data frames.

This data comes from an egocentric social network study where I collected ego-network data in edgelist format from dozens of different respondents. The data collection software that I use stores the data from each interview in its own .csv file. Here is an image of the raw data for a specific data frame (image of raw data).

For my purposes, I only need to use data from the fourth, sixth, and seventh columns. Furthermore, I only need rows of data where the last column has values of 4, at which point the final column can be deleted entirely. The end result is a two-column data frame that represents relationships among pairs of people.

After reading in the data and storing it as an object, I ran the following code:

x100291 = `100291AlterPair.csv`       #new object based on raw data
foc.altername = x100291$Alter.1.Name  
altername = x100291$Alter.2.Name      
tievalue = x100291$AlterPair_B        
tie = tievalue                        
tie[(tie<4)] = NA                     
egonet.name = data.frame(foc.altername, altername, tievalue) 
depleted.name = cbind(tie,egonet.name)
depleted.name = depleted.name[is.na(depleted.name[,1]) == F,] 
dep.ego.name = data.frame(depleted.name$foc.altername, depleted.name$altername)

This produced the following data frame (image of final data). This is ultimately what I want.

Now I know that I could cut-and-paste this same set of code 100+ times and manually alter the file names, but I would prefer not to do that. Instead, I have stored all of my raw .csv files as data frames in a single list. I suspect that I can apply the same code across all of the data frames by using one of the apply commands, but I cannot figure it out.

Does anyone have any suggestions for how I might apply this basic code to a list of data frames so that I end up with a new list containing cleaned and reduced versions of the data?

Many thanks!

Richard Telford
  • 9,558
  • 6
  • 38
  • 51
C.Whichard
  • 15
  • 2
  • If the column names are same across all data. frames you could do `processList = lapply(fileList, function(x) fn_CustomFunc(DF = x) )` where you define `fn_CustomFunc` a custom function that handles all the data processing and use `rbind` to combine output from all data.frames – Silence Dogood Mar 12 '17 at 22:42

2 Answers2

0

The logic can be simplified. Try creating a custom function and apply over all dataframes.

cleanDF <- function(mydf) {
  if( all(!c('AlterPair_B', 'Alter.1.Name', 'Alter.2.Name') %in% 
          names(mydf))) stop("Check data frame names")

  condition <- mydf[, 'AlterPair_B'] >= 4
  mydf[condition, c("Alter.1.Name", "Alter.2.Name")]
}
big_list <- lapply(all_my_files, read.csv)  #read in all data frames
result <- do.call('rbind', lapply(big_list, cleanDF))

The custom function cleanDF first checks that all the relevant column names are there. Then it defines the condition of 4 or more 'AlterPair_B'. Lastly, subset the two target columns by that condition. I used a list called 'big_list' that represents all of the data frames.

Pierre L
  • 28,203
  • 6
  • 47
  • 69
0

You haven't provided a reproducible example so it's hard to solve your problem. However, I don't want your questions to remain unanswered. It is true that using lapply would be a fast solution, usually preferable to a loop. However, since you mentioned being a beginner, here's how to do that with a loop, which is easier to understand.

You need to put all your csv files in a single folder with nothing else. Then, you read the filenames and put them in a list. You initialize an empty result object with NULL. You then read all your files in a loop, do calculations and rbind the results in the result object.

path <-"C:/temp/csv/"
list_of_csv_files <- list.files(path)

result <- NULL
for (filenames in list_of_csv_files) {
input <- read.csv(paste0(path,filenames), header=TRUE, stringsAsFactors=FALSE)
#Do your calculations
input_with_calculations <- input
result <- rbind(result,input_with_calculations)
}
result
Pierre Lapointe
  • 16,017
  • 2
  • 43
  • 56