1

I am learning R and attempting to figure out splitting a column. I am looking to spread my data from a single column in wide format. I was told to use dcast, but i haven't figured out the best way and was going to try to pipe it through tidyverse.

# sample data
> data <- data.frame(trimesterPeriod = c(first, second, third, PP, third, second, PP, first )
# dataframe 
  trimesterPeriod 
1 first
2 second
3 third
4 PP
5 third
6 second
7 PP
8 first

and i would it to look like this:

#dataframe
ID     first       second       third       PP
1        1            0           0         0
2        0            1           0         0 
3        0            0           1         0
4        0            0           0         1 
5        0            0           1         0 
6        0            1           0         0 
7        0            0           0         1
8        1            0           0         0 

i know that i will have to change the trimesterPeriod data from a character, but from then i'm not sure where to go. i was thinking to do:

data.frame %>%
    mutate(rn = row_number(first, second, third, PP)) %>%
    spread(trimesterPeriod) %>%
    select(-rn)

but i'm not sure. any suggestions are greatly appreciated!

lziegs
  • 37
  • 5

2 Answers2

1

We could use table from base R

table(seq_len(nrow(data)), data$trimesterPeriod)

-output

    first PP second third
  1     1  0      0     0
  2     0  0      1     0
  3     0  0      0     1
  4     0  1      0     0
  5     0  0      0     1
  6     0  0      1     0
  7     0  1      0     0
  8     1  0      0     0

Or using tidyverse

library(dplyr)
library(tidyr)
 data %>% 
   mutate(ID = row_number()) %>%
   pivot_wider(names_from = trimesterPeriod, 
     values_from = trimesterPeriod, values_fn = length, 
        values_fill = 0)

-output

# A tibble: 8 × 5
     ID first second third    PP
  <int> <int>  <int> <int> <int>
1     1     1      0     0     0
2     2     0      1     0     0
3     3     0      0     1     0
4     4     0      0     0     1
5     5     0      0     1     0
6     6     0      1     0     0
7     7     0      0     0     1
8     8     1      0     0     0

data

data <- structure(list(trimesterPeriod = c("first", "second", "third", 
"PP", "third", "second", "PP", "first")),
 class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8"))
akrun
  • 874,273
  • 37
  • 540
  • 662
1

Using dcast from data.table -

library(data.table)

dcast(setDT(data), seq_len(nrow(data)) ~ trimesterPeriod, 
      value.var = 'trimesterPeriod', fun.aggregate = length)

#   data PP first second third
#1:    1  0     1      0     0
#2:    2  0     0      1     0
#3:    3  0     0      0     1
#4:    4  1     0      0     0
#5:    5  0     0      0     1
#6:    6  0     0      1     0
#7:    7  1     0      0     0
#8:    8  0     1      0     0
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213