2

I'm trying to reshape a data frame in R:

  Gene_ID Value Gene_ID.1 Value.1 Gene_ID.2 Value.2
1       A     0         A       3         A       1
2       B     5         B       6         B       5
3       C     7         C       2         C       7
4       D     8         D       9         D       2
5       E     5         E       8         E       4
6       F     6         F       4         F       5

I want to make it look like this:

   Gene_ID Value
1        A     0
2        B     5
3        C     7
4        D     8
5        E     5
6        F     6
7        A     1
8        B     5
9        C     7
10       D     2
11       E     4
12       F     5
13       A     3
14       B     6
15       C     2
16       D     9
17       E     8
18       F     4

So simply stack the columns with the same names together. Is there a way to do so? Thanks!

Tung
  • 26,371
  • 7
  • 91
  • 115
a.sandra
  • 21
  • 3

1 Answers1

3

You can use either the combination of gather()/spread() or pivot_longer() from the tidyr package.

To learn more about the new pivot_xxx() functions, check out these links:

library(dplyr)
library(tidyr)

txt <- "  Gene_ID.0 Value.0 Gene_ID.1 Value.1 Gene_ID.2 Value.2
1       A     0         A       3         A       1
2       B     5         B       6         B       5
3       C     7         C       2         C       7
4       D     8         D       9         D       2
5       E     5         E       8         E       4
6       F     6         F       4         F       5"

dat <- read.table(text = txt, header = TRUE)

Combine gather(), separate() and spread() functions

dat %>%
  mutate(Row_Nr = row_number()) %>%
  gather(key, value, -Row_Nr) %>%
  separate(key, into = c("key", "Gene_Nr"), sep = "\\.") %>% 
  spread(key, value) %>%
  select(-Row_Nr)
#> Warning: attributes are not identical across measure variables;
#> they will be dropped
#>    Gene_Nr Gene_ID Value
#> 1        0       A     0
#> 2        1       A     3
#> 3        2       A     1
#> 4        0       B     5
#> 5        1       B     6
#> 6        2       B     5
#> 7        0       C     7
#> 8        1       C     2
#> 9        2       C     7
#> 10       0       D     8
#> 11       1       D     9
#> 12       2       D     2
#> 13       0       E     5
#> 14       1       E     8
#> 15       2       E     4
#> 16       0       F     6
#> 17       1       F     4
#> 18       2       F     5

Use pivot_longer()

### gather all values columns
### separate original column names by the period "."
### into Gene_ID/Value and Gene_Nr
dat %>% 
  pivot_longer(everything(),
               names_to = c(".value", "Gene_Nr"),
               names_pattern = "(.*)\\.(.*)")
#>    Gene_Nr Gene_ID Value
#> 1        0       A     0
#> 2        1       A     3
#> 3        2       A     1
#> 4        0       B     5
#> 5        1       B     6
#> 6        2       B     5
#> 7        0       C     7
#> 8        1       C     2
#> 9        2       C     7
#> 10       0       D     8
#> 11       1       D     9
#> 12       2       D     2
#> 13       0       E     5
#> 14       1       E     8
#> 15       2       E     4
#> 16       0       F     6
#> 17       1       F     4
#> 18       2       F     5

Created on 2019-12-08 by the reprex package (v0.3.0)

Tung
  • 26,371
  • 7
  • 91
  • 115
  • Forgot mentioning that you'll have to manually change the first 2 columns to `Gene_ID.0` and `Value.0` to make it work. Or do something like this `dat <- dat %>% rename(Gene_ID.0 = Gene_ID, Value.0 = Value)` – Tung Dec 09 '19 at 04:46