2

I am looking to standardize cleaning exports from Survey Monkey at my organization, and want to rename the column name to (Column name + first row name) if the first row is not NA.

Edit: This would ideally be implemented in a function/loop so that it would work on data frames of varying sizes, without having to edit any parameters.

Reprex:

df <- tribble(
  ~`Which of these choices do you like`, ~`...1`, ~`...2`, ~`...3`, ~`Respondent ID`, ~`Different Text`, ~`...4`,
  'Fruit', 'Drink', 'Dessert', 'Snack', NA, 'Pizza Topping', 'Pizza Style',
  'Apple', 'Water', 'Pie', 'Oreos', 1234, 'Mushroom', 'Deep Dish',
  'Apple', 'Coffee', 'Cake', 'Granola', 1235, 'Onion', 'NY Style',
  'Banana', 'Coffee', 'Pie', 'Oreos', 1236, 'Mushroom', 'NY Style',
  'Pear', 'Vodka', 'Pie', 'Granola', 1237, 'Onion', 'Deep Dish'
)

After the columns are renamed, I would delete the first row and carry on with my life.

Ideally, my df would look like this:

enter image description here

Thank you for any guidance!

Matt
  • 7,255
  • 2
  • 12
  • 34

1 Answers1

3

In base R, we can use paste and then remove the first row

names(df)[1:4] <- paste0(names(df)[1], unlist(df[1, 1:4]))
df <- df[-1, ]

Or using sprintf

names(df)[1:4] <- sprintf("%s (%s)", names(df)[1], unlist(df[1, 1:4]))
df <- df[-1,]

If we want to do this by checking the NA elements

library(dplyr)
library(tidyr)
library(purrr)
library(stringr)
keydat <- df %>%
          slice(1) %>% 
          select_if(negate(is.na)) %>%
          pivot_longer(everything()) %>%
          group_by(grp = cumsum(!startsWith(name, "..."))) %>% 
          mutate(value = sprintf("%s (%s)", first(name), value)) %>% 
          ungroup %>% 
          select(-grp)


df <- df %>%
        rename_at(vars(keydat$name), ~ keydat$value) %>%
        slice(-1)

df
# A tibble: 4 x 7
#  `Which of these… `Which of these… `Which of these… `Which of these… `Respondent ID`
#  <chr>            <chr>            <chr>            <chr>                      <dbl>
#1 Apple            Water            Pie              Oreos                       1234
#2 Apple            Coffee           Cake             Granola                     1235
#3 Banana           Coffee           Pie              Oreos                       1236
#4 Pear             Vodka            Pie              Granola                     1237
# … with 2 more variables: `Different Text (Pizza Topping)` <chr>, `Different Text (Pizza
#   Style)` <chr>

names(df)
#[1] "Which of these choices do you like (Fruit)"   "Which of these choices do you like (Drink)"  
#[3] "Which of these choices do you like (Dessert)" "Which of these choices do you like (Snack)"  
#[5] "Respondent ID"                                "Different Text (Pizza Topping)"              
#[7] "Different Text (Pizza Style)"      
akrun
  • 874,273
  • 37
  • 540
  • 662
  • This does work for the reprex, but I would be looking to do this at scale, which could mean data frames of varying sizes. Ideally, I would want a function or loop that is able to scan the entire data frame without adjusting any values. – Matt Feb 28 '20 at 18:02
  • This is much closer, but isn't working exactly as I would like. After the first non-NA value in the first row, the column name is copied over all of the following columns. I'm not familiar with a lot of the functions you used, so I'm not in a place where I can really troubleshoot it. I appreciate all of your help so far! – Matt Feb 28 '20 at 18:40
  • @Matt not clear about the issue. Can you update your post with a new example that shows the issue – akrun Feb 28 '20 at 18:42
  • @Matt isnt this the names you want in the expected output `[1] "Which of these choices do you like (Fruit)" "Which of these choices do you like (Drink)" #[3] "Which of these choices do you like (Dessert)" "Which of these choices do you like (Snack)" #[5] "Respondent ID"` – akrun Feb 28 '20 at 18:43
  • I updated the reprex to show the issue. When a non-NA field is found in the first row, I would want the column + first row copied from that specific column. Your solution finds the first non-NA column name, and then copies it to the rest of the columns. – Matt Feb 28 '20 at 18:56
  • @Matt Are you removing `Different Text` column? In that case, what would be the logic – akrun Feb 28 '20 at 18:58
  • I updated the image to show what I'd like to achieve. I need the column names to be as descriptive as possible, so they can either be referenced or renamed by a team of analysts. – Matt Feb 28 '20 at 19:08
  • @Matt I guess the new update should work because I earlier I was using a different logic – akrun Feb 28 '20 at 19:37
  • For some reason, this still isn't working exactly as I would like, but it gives me a great place to start. Thank you! – Matt Feb 28 '20 at 20:00
  • @Matt Can you check the order of your columns. Here, how I renamed is based on the occurence of `...` column names and renamed by creating a group whenever there is non `...` as starting element – akrun Feb 28 '20 at 20:03