2

I have a batch of data sets with many variables. I need to rename some variables in each data set to a common name, without changing the others. One issue is that each the column I want to rename is in a different location in each data set and each data set has a different number of columns. Example data and code below.

Working Data:

package(tidyverse)

df1 <- tribble(
~var1, ~var2, ~var3, ~var4, ~var5,
"1",   "1",   "1",   "a",   "d",
"2",   "2",   "2",   "b",   "e",
"3",   "3",   "3",   "c",   "f"
)

df2 <- tribble(
~var1, ~help, ~var3,
"1",   "1",   "1",
"2",   "2",   "2",
"3",   "3",   "3"
)

df3 <- tribble(
~var1, ~newCol, ~var3, ~help, ~var5, ~var6,
"1",   "4",     "1",   "a",   "d",   "1",
"2",   "5",     "2",   "b",   "e",   "2",
"3",   "6",     "3",   "c",   "f",   "3"
)

I would like to recode help to var2 so it is common across the data sets. I know that I can do it individually using dplyr like this:

df2 <- df2 %>%
  rename(var2 = help)

df3 <- df3 %>%
  rename(var2 = help)

But I have to do this across over 100 data sets, and I would like to do it more efficiently. I was using the suggested answer from this post as an example, but it only works for data sets that have the same number and order of columns, which mine do not. I did try this, but it didn't work:

dfs <- list(df2$help = df2$help, df3$help = df3$help)

colnames <- c("var2", "var2") 

list2env(lapply(dfs, setNames, colnames), .GlobalEnv)

But I get the following error at the dfs <- list(df2$help = df2$help... command:

Error: unexpected '=' in "dfs <- list(df2$help ="

All of the examples I could find do not show how to change specific columns across data sets, only how to change all column names or change a small number of columns in data sets that only have a few columns.

scottsmith
  • 371
  • 2
  • 11

2 Answers2

2

This is what you are looking for:

 dat=lapply(list(df2=df2,df3=df3),function(x){names(x)[which(names(x)=="help")]="var2";x})
 list2env(dat,.GlobalEnv)
Onyambu
  • 67,392
  • 3
  • 24
  • 53
  • Thanks @onyambu, but this only works if the data sets have the same number or columns and/or the column that needs to be changed is in the same location. My data sets are not like that. I updated the example data to be more realistic to my actual data. I was hoping I could "call out" each column in the data sets and rename, no matter the column location or number of columns in each data set. – scottsmith Dec 25 '17 at 01:17
  • so what specific name do you want to give to help in this case? – Onyambu Dec 25 '17 at 01:20
  • In each data set, I would like to change `help` to `var2`, no matter its location in its respective data set. – scottsmith Dec 25 '17 at 01:22
  • I changed it `var2` in `df3` to `newCol`. For more context, my real-world data has two different names for income: `income` and `hhIncome`. I need to change all the `hhIncome`s to `income` before I merge the data sets. Each data set has `income` and `hhIncome` in different column locations because they added and deleted variables with each new survey data collection. – scottsmith Dec 25 '17 at 01:33
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/161920/discussion-between-scottsmith-and-onyambu). – scottsmith Dec 25 '17 at 01:43
1

Here's a solution with tidyverse

library(tidyverse)

list(df1, df2, df3) %>% 
map(~if("help" %in% names(.x))rename(.x, var2=help) else .x)
dmi3kno
  • 2,943
  • 17
  • 31