1

I am having difficulty using a for loop for appending new data to each data frame element of a list.

If I have a list of two data frames (filelist) and I wish to "dplyr::left_join" or "merge" each data frame in the list with other data from a single data frame, it does not seem to appear in the list afterward. However, if I use the same commands stepwise and separately for each data frame element of the list, I get the same warnings (due to missing factor levels), but the desired result. For example:

some data frames

df1 <- data.frame(x = 1:3, y=letters[1:3])
df2 <- data.frame(x = 1:5, y=letters[1:5])

# make list of dataframes
filelist <- list(df1,df2)

# new data frame to add to the data frames in the list by indexing "y"
df3 <- data.frame(animal = c(rep("snake", 7)), y=letters[1:7], geno = c("aa", "ab", "ac", "aa", "ac", "ab", "ae"))

# merge df3 into both data frames in the filelist
for (i in 1:length(filelist)) {dplyr::left_join(filelist[[i]], df3, by = "y")}

## Gives the following warning because some factor levels are missing between datasets
Warning message:
Column `y` joining factors with different levels, coercing to character vector 

returned result is the same as the original filelist

> filelist
[[1]]
  x y
1 1 a
2 2 b
3 3 c

[[2]]
  x y
1 1 a
2 2 b
3 3 c
4 4 d
5 5 e

The expected result (done by merging each element of the list separately, then making a new list)

new1 <- dplyr::left_join(filelist[[1]], df3, by = "y")
new2 <- dplyr::left_join(filelist[[2]], df3, by = "y")
newlist <-(new1,new2)
> newlist
[[1]]
  x y animal geno
1 1 a  snake   aa
2 2 b  snake   ab
3 3 c  snake   ac

[[2]]
  x y animal geno
1 1 a  snake   aa
2 2 b  snake   ab
3 3 c  snake   ac
4 4 d  snake   aa
5 5 e  snake   ac

What is the best way to do this without taking each data frame out of the original list, adding the new data, then creating a new list?

EMS_glenn
  • 29
  • 4

2 Answers2

0

I'd use the map function from the purrr package which like dplyr is a part of the tidyverse:

library(tidyverse)
library(purrr) # loaded when you call tidyverse, but doing it explicitly here

map(filelist, left_join, df3)

[[1]]
  x y animal geno
1 1 a  snake   aa
2 2 b  snake   ab
3 3 c  snake   ac

[[2]]
  x y animal geno
1 1 a  snake   aa
2 2 b  snake   ab
3 3 c  snake   ac
4 4 d  snake   aa
5 5 e  snake   ac

Warning messages:
1: Column `y` joining factors with different levels, coercing to character vector 
2: Column `y` joining factors with different levels, coercing to character vector 
Ben G
  • 4,148
  • 2
  • 22
  • 42
  • Thanks that worked too, but I liked the Pelilican's option as it kept the joining variable as a factor (no big deal, but one less thing to do) – EMS_glenn Apr 04 '19 at 12:11
  • you mean the "by" argument in `left_join`? You can add that, but it's not necessary. – Ben G Apr 04 '19 at 12:25
  • No not the "by" argument. This "map" methods coerced the factor to a character (as per the warnings), so it would just have to be reconverted to a factor is all. But your solution worked just fine too - sorry newbie, not sure on assigning "answered' etiquette. – EMS_glenn Apr 04 '19 at 12:44
  • Yeah, completely up to you and how you want to do things. Would be pretty easy to make it a factor again with `mutate(y = as_factor(y))`. I try to avoid for loops whenever possible. – Ben G Apr 04 '19 at 13:20
0

As says in the warning message, factors have different levels.

You can convert factors to characters for each dataframe as follow with dplyr :

df %>% mutate_if(is.factor, as.character) -> df

Or homogenize factor levels of variable y :

for (i in 1:length(filelist)) {
  x = factor(unique(c(levels(filelist[[i]]$y),levels(df3$y))))
  levels(filelist[[i]]$y) = x
  levels(df3$y) = x
  filelist[[i]] = dplyr::left_join(filelist[[i]], df3, by = "y")
}
ophdlv
  • 254
  • 1
  • 6