2

I have a data frame with exam questions organized as you see below in input. I'm trying to organize it in a tidy way as displayed in output.

In input you can see student's ID, their answer to the specific item suffix = ".text", their score for that specific item suffix = ".score", and their total score.

Input

library(tibble)

input <- tribble(
     ~ID, ~i1.text, ~i1.score, ~i2.text, ~i2.score, ~total,
  "Mark",      "A",        0L,      "B",        1L,     1L,
  "Mary",      "C",        1L,      "D",        0L,     1L,
  "John",      "A",        0L,      "B",        1L,     1L,
  "Abby",      "C",        1L,      "B",        1L,     2L
  )

I need to pivot the data to look like the one below.

I'm quite certain that I can do that with pivot_longer() but I'm quite lost.

Expected Output

output <- tribble(
     ~ID, ~item, ~text, ~score, ~total,
  "Mark",  "i1",   "A",     0L,     1L,
  "Mark",  "i2",   "B",     1L,     1L,
  "Mary",  "i1",   "C",     1L,     1L,
  "Mary",  "i2",   "D",     0L,     1L,
  "John",  "i1",   "A",     0L,     1L,
  "John",  "i2",   "B",     1L,     1L,
  "Abby",  "i1",   "C",     1L,     2L,
  "Abby",  "i2",   "B",     2L,     2L
  )

output

# A tibble: 8 × 5
  ID    item  text  score total
  <chr> <chr> <chr> <int> <int>
1 Mark  i1    A         0     1
2 Mark  i2    B         1     1
3 Mary  i1    C         1     1
4 Mary  i2    D         0     1
5 John  i1    A         0     1
6 John  i2    B         1     1
7 Abby  i1    C         1     2
8 Abby  i2    B         2     2
Ruam Pimentel
  • 1,288
  • 4
  • 16

2 Answers2

2

We can use pivot_longer with names_sep as . - the column 'item' return the prefix part of the column names before the . and the .value will return the values of the column with the suffix part of the column name after the .

library(tidyr)
pivot_longer(input, cols = contains("."), 
    names_to = c("item", ".value"), names_sep = "\\.")

-output

# A tibble: 8 × 5
  ID    total item  text  score
  <chr> <int> <chr> <chr> <int>
1 Mark      1 i1    A         0
2 Mark      1 i2    B         1
3 Mary      1 i1    C         1
4 Mary      1 i2    D         0
5 John      1 i1    A         0
6 John      1 i2    B         1
7 Abby      2 i1    C         1
8 Abby      2 i2    B         1
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Could you add to your response an explanation about why `names_sep` use `”\\.” ` instead of just “.”? – Ruam Pimentel Apr 01 '22 at 19:35
  • @RuamPimentel by defaut, it is regex mode and `.` can match any character in `regex`. Escaping is done to read it literally – akrun Apr 01 '22 at 21:07
2

Here is an similar but alternative approach using names_pattern:

Explanation: (.*)\\.(.*) is regex:

() captures a group

. captures any character

* is a quantifier means 0 or more

\\ Escape character

The regular expression means: any amount of character followed by a dot followed by any amount of character. This regex matches your column names:

library(dplyr)
library(tidyr)

input %>% 
  pivot_longer(
    cols = -c(ID, total), 
    names_to = c('item', '.value'),
    names_pattern = '(.*)\\.(.*)'
    )
  ID    total item  text  score
  <chr> <int> <chr> <chr> <int>
1 Mark      1 i1    A         0
2 Mark      1 i2    B         1
3 Mary      1 i1    C         1
4 Mary      1 i2    D         0
5 John      1 i1    A         0
6 John      1 i2    B         1
7 Abby      2 i1    C         1
8 Abby      2 i2    B         1
TarJae
  • 72,363
  • 6
  • 19
  • 66