0

I am pretty ok with loops in Stata and I am trying to program something similar in R, but I keep getting error messages- any idea why this loop doesnt work?

I have an Excel sheet with data with separate tabs for each month (as in monthlist). I want to import each separate tab and add year and month column to it.

monthlist = list("Jan", "Feb", "Mar", "Apr", "May", "June", "Jul", "Aug", 
"Sep", "Oct", "Nov", "Dec")

for (k in seq_along(monthlist)){
   infile <- paste(i, " GP",".xlsx",sep=""); name<- 
   paste("X",i,"_GP",sep="")
   name.[k]<- read_excel(infile, sheet = [k])
   month=[k]
   name.[k] = cbind(year, month, name.[k])
  }

I have tried and tried researching the correct reference to k values, but I cant figure it out. Please help.

Ania K
  • 9
  • 1
  • Where are you getting the year? I don't see it in the loop. – AndS. Sep 21 '18 at 20:45
  • 1
    Some good advice on reading in from multiple excel worksheets here: https://readxl.tidyverse.org/articles/articles/readxl-workflows.html https://stackoverflow.com/questions/51120287/reading-multiple-xlsx-files-each-with-multiple-sheets-purrr – Jon Spring Sep 21 '18 at 20:46
  • Please post the error message(s) you're getting. – Mike S Sep 21 '18 at 20:56
  • I only asked about the sub-loop to not be too general, but here's the whole loop, including year: i=2014 monthlist = list("Jan", "Feb", "Mar", "Apr", "May", "June", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec") while (i<=2016) { year=i for (k in seq_along(monthlist)){ infile <- paste(i, " GP",".xlsx",sep=""); name<- paste("X",i,"_GP",sep="") name.[k]<- read_excel(infile, sheet = [k]) month=[k] name.[k] = cbind(year, month, name.[k]) } i=i+1 } – Ania K Sep 21 '18 at 21:00
  • Mike S, I saw that import article on multiple sheets, but it gets a little bit more complicated because, while most columns on my sheets are the same, some differ, and I can rbind the data.frame (or so I think, because trial gave me multiple errors). So I went around writing my own loop. I guess the more general question is about cell referencing - in my loop i write for (k in seq_along(monthlist)){ }- but how do i correctly refer to that "k" inside the loop? – Ania K Sep 21 '18 at 21:04
  • So you have 3 files (2014, 2015, 2016) and each file has a sheet per month and you want to read each sheet from each file into its own dataframe? – AndS. Sep 21 '18 at 21:06
  • yes, exactly. Then I will merge them and drop non-matching columns. My error messages are: Error: unexpected '[' in: " infile <- paste(i, " GP",".xlsx",sep=""); name<- paste("X",i,"_GP",sep="") name.[k]<- read_excel(infile, sheet = [" > month=[k] Error: unexpected '[' in " month=[" > name.[k] = cbind(year, month, name.[k]) Error in cbind(year, month, name.[k]) : object 'name.' not found – Ania K Sep 21 '18 at 21:07

2 Answers2

0

If I understand correctly following the comments, I think you want to take data from different sheets across several workbooks, and assign each sheet's data to its own separate dataframe, so that you have many dataframes; and you don't want to do this combining inside the loop because each could be structured differently. If that's the case then you could modify your code to this:

monthlist = list("Jan", "Feb", "Mar", "Apr", "May", "June", "Jul", "Aug", 
"Sep", "Oct", "Nov", "Dec")

for (k in seq_along(monthlist)){
   infile = paste(i, " GP",".xlsx",sep="")
   name = paste("X",i,"_GP",sep="")
   d_temp = read_excel(infile, sheet = k)
   assign(paste0(year, month), d_temp)
  }

The key points there are to drop the square brackets around k when reading in the file, and secondly to use assign() to dynamically create a variable name. d_temp temporarily stores the data frame before assign creates a new distinct variable from it.

I note references to year and i in your code which I assume you have set beforehand, so I'm not addressing those.

Mike S
  • 312
  • 1
  • 8
0

Without your data, it is a little hard to tell you if this is going to work or not, but here are a few options. I would personally create a nested dataframe of dataframes if I wasn't sure that all dataframes had the same number of rows.

library(tidyverse)
library(readxl)

df <- data_frame(year = rep(c(2014, 2015, 2016), each = 12), 
                 month = rep(c("Jan", "Feb", "Mar", "Apr", "May", "June", "Jul", "Aug", 
                              "Sep", "Oct", "Nov", "Dec"), 3)) %>%
  mutate(infile = map_chr(year, ~paste(.x, " GP",".xlsx",sep="")),
         data = map2(infile, month, ~read_excel(.x, sheet = .y)),
         data = map2(data, month, ~mutate(.x, month = .y)),
         data = map2(data, year, ~mutate(.y, year = .y)))

However, if you want to use a loop, you could also do that and I would recommend a list of dataframes in that case.

year <- c(2014:2016)
month = c("Jan", "Feb", "Mar", "Apr", "May", "June", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec") 
df.list <- list()
for(i in seq_along(year)){
  for (k in seq_along(month)){ 
  infile <- paste(year[[i]], " GP",".xlsx",sep="") 
  name<- paste("X",year[[i]],"_GP",sep="") 
  df.list[[i]] <- read_excel(infile, sheet = month[[k]]) 
  }
}
AndS.
  • 7,748
  • 2
  • 12
  • 17