1

New-ish to R and I feel like this has a simple solution, but I can't figure it out.

I have 59 excel files that I want to combine. However, 4 of the columns have a mix of dates and NA's (depending on if the study animal is a migrant or not) so R won't let me combine them because some are numeric and some are character. I was hoping to read all of the excel files into R, convert those 4 columns in each file to as.character, and then merge them all. I figured a loop could do this.

Anything I find online has me typing out the name for each read file, which I don't really want to do for 59 files. And once I do have them read into R and those columns converted, can I merge them from R easily? Sorry if this is simple, but I'm not sure what to do that would make this easier.

jpsmith
  • 11,023
  • 5
  • 15
  • 36
  • Welcome to Stack Overflow! Could you clarify what you mean by "combine"? Do they have a unique identifier or do you want to append all the columns form the 59 excel files? – jpsmith Dec 03 '22 at 22:55
  • Thank you! And sorry about that, to clarify, they all have the same column names, I just want to append all of the rows together. – Rebekah Lumkes Dec 03 '22 at 23:52
  • Thanks for clarifying! See option 2b below and see if that does what you need - if not I can edit my answer – jpsmith Dec 03 '22 at 23:54

1 Answers1

0

You can do this quickly using lapply. It was unclear exactly how you wanted to combine the files (a true merge by a common variable, or append the rows, or append the columns). Either way, I do not believe you need to change anything to as.character for any of the below approaches (2A - 2C) to work:

library(readxl)

# 1. Read in all excel files in a folder given a specific filepath
filepath <- "your/file/path/"
file_list <- list.files(path = filepath, pattern='*.xlsx')
df_list <- lapply(file_list, read_excel)

# 2a. Merge data (assuming a unique identifier, i.e "studyid")
final_data <- Reduce(function(...) merge(..., by = "studyid", all = TRUE), df_list)

# 2b. If all files have the same columns, append to one long dataset
final_data <- do.call(rbind, df_list)

# 2c. If you want to make a wide dataset (append all columns)
final_data <- do.call(cbind, df_list)
jpsmith
  • 11,023
  • 5
  • 15
  • 36
  • Thanks everyone! @jpsmith, it was working well until I got this: Error in as.POSIXlt.character(x, tz, ...) : character string is not in a standard unambiguous format. The specific columns have both dates and NA's because it's based on animal data and some of the animals are migrants (thus they have start and end dates) and some do not migrate and have NAs. Is there a way to ignore NA's? – Rebekah Lumkes Dec 04 '22 at 00:20
  • Try converting everything to characters before using the `do.call(..)` step by `df_list <- lapply(df_list , function(x) apply(x, 2, as.character))`. You can then convert back to date variables a number of ways depending on your data – jpsmith Dec 04 '22 at 00:32
  • 1
    Thank you, that worked! And yes, I can convert them back to dates later, but for now, I'm a happy grad student. Thanks again! – Rebekah Lumkes Dec 04 '22 at 00:53