2

I would greatly appreciate assistance from an R expert after struggling to solve this challenge for a few hours by trying tidyr, reshape, spread, etc.

With a data frame of varying groups and assigned values, is there a way to convert and transpose the data frame such that each group is assigned to a new column and all of the assigned values are listed under the group?

Here is some sample code for a data frame:

a <- c("Group1", "Group1", "Group1", "Group2", "Group2", "Group2", "Group2", "Group2", "Group3")
b <- c("Item1", "Item2", "Item3", "Item4", "Item5", "Item6", "Item7", "Item8", "Item9") 

With these two columns, create a new column for each group.

Below, I am showing this manually but would need R to automate this next step.

I manually included "--" to avoid the error "Error in data.frame: arguments imply differing number of rows". In practice, I would not be able to add "--" for each group.

Group1 <- c("Item1", "Item2", "Item3", "--", "--")
Group2 <- c("Item4", "Item5", "Item6", "Item7", "Item8")
Group3 <- c("Item9", "--", "--", "--", "--")

Below, this is the output that I am trying to create.

table <- data.frame(Group1, Group2, Group3)

The challenge is that the variables have to be dynamic. The number of groups and number of items will change for different data sets, and I am unable to manually "--" for the blanks in each group.

The question is similar to this one, except my question involves dynamic ranges. Convert data frame common rows to columns

milaske
  • 55
  • 2
  • 8

1 Answers1

10

We can use tidyr::spread

library(tidyverse)
df %>% group_by(a) %>% mutate(n = 1:n()) %>% spread(a, b) %>% select(-n)
## A tibble: 5 x 3
#  Group1 Group2 Group3
#  <fct>  <fct>  <fct>
#1 Item1  Item4  Item9
#2 Item2  Item5  NA
#3 Item3  Item6  NA
#4 NA     Item7  NA
#5 NA     Item8  NA

Or if you prefer "--" instead of NA you can do (thanks @AntoniosK)

df %>%
    group_by(a) %>%
    mutate(n = 1:n()) %>%
    spread(a, b) %>%
    select(-n) %>%
    mutate_all(~ifelse(is.na(.), "--", as.character(.)))
## A tibble: 5 x 3
#  Group1 Group2 Group3
#  <chr>  <chr>  <chr>
#1 Item1  Item4  Item9
#2 Item2  Item5  --
#3 Item3  Item6  --
#4 --     Item7  --
#5 --     Item8  --

or using tidyr::spreads fill argument

df %>%
    mutate_if(is.factor, as.character) %>%
    group_by(a) %>%
    mutate(n = 1:n()) %>%
    spread(a, b, fill = "--") %>%
    select(-n)

giving the same result.


Sample data

a <- c("Group1", "Group1", "Group1", "Group2", "Group2", "Group2", "Group2", "Group2", "Group3")
b <- c("Item1", "Item2", "Item3", "Item4", "Item5", "Item6", "Item7", "Item8", "Item9")
df <- data.frame(a = a, b = b)
Maurits Evers
  • 49,617
  • 4
  • 47
  • 68