0

I have a dataset in the wide format which looks like this:

header 1      x    x    x    y    y    z    z
header 2      1    2    3    1    2    1    2 
 2015         3.0  4.0  2.0  3.1  2.1  2.3  2.1
 2016         2.0  4.4  1.0  2.1  3.1  5.3  0.1  
 ...

I want to change it to A long format which looks like this:

header 1      variable   1    2    3
 2015            x       3.0  4.0  2.0
 2015            y       3.1  2.1  N/A
 2015            Z       2.3  2.1  N/A
 2016           ...

This was my approach:

  1. concatenate header 1 and header two with a separator
  2. use melt() function, and
  3. split the two

But, this is not working as the melt function changes my characters to factor and then I can't split them. Can anyone help or suggest a better approach?

Jaap
  • 81,064
  • 34
  • 182
  • 193
Piyush Ahuja
  • 181
  • 3
  • 7
  • 1
    Can you make a reproducible example and include the code and desired output? – Roman Luštrik Sep 07 '15 at 11:51
  • You can `as.character` your factor variables back to character. this is also really doable with `tidyr` but there's no code in your post to work from. – hrbrmstr Sep 07 '15 at 12:17

2 Answers2

0

Data

d <- read.table(header = FALSE, stringsAsFactors = FALSE,
                text = "header1      x    x    x    y    y    z    z
header2      1    2    3    1    2    1    2 
 2015         3.0  4.0  2.0  3.1  2.1  2.3  2.1
 2016         2.0  4.4  1.0  2.1  3.1  5.3  0.1")

Concatenate first two rows

names(d)[-1] <- paste(d[1, -1], d[2, -1], sep = ".")
d.new <- d[-(1:2), ]

Melt & Cast

d.m <- melt(d.new, "V1", variable_name = "var")
cast(cbind(d.m, colsplit(d.m$var, split = "\\.", names = c("variable",
    "number"))), V1 + variable  ~ number)

#     V1 variable   1   2    3
# 1 2015        x 3.0 4.0  2.0
# 2 2015        y 3.1 2.1 <NA>
# 3 2015        z 2.3 2.1 <NA>
# 4 2016        x 2.0 4.4  1.0
# 5 2016        y 2.1 3.1 <NA>
# 6 2016        z 5.3 0.1 <NA>
thothal
  • 16,690
  • 3
  • 36
  • 71
0

Or with the hadleyverse:

read.table(text="header_1      x    x    x    y    y    z    z
header_2      1    2    3    1    2    1    2
 2015         3.0  4.0  2.0  3.1  2.1  2.3  2.1
 2016         2.0  4.4  1.0  2.1  3.1  5.3  0.1", 
           header=FALSE, stringsAsFactors=FALSE) -> dat

# deal with "headers"
colnames(dat) <- c("year", sprintf("%s %s", dat[1,], dat[2,])[-1])

# reshape the data
gather(dat[-(1:2),], variable, value, -year) %>%
  separate(variable, c("variable", "i")) %>%
  spread(i, value)

##   year variable   1   2    3
## 1 2015        x 3.0 4.0  2.0
## 2 2015        y 3.1 2.1 <NA>
## 3 2015        z 2.3 2.1 <NA>
## 4 2016        x 2.0 4.4  1.0
## 5 2016        y 2.1 3.1 <NA>
## 6 2016        z 5.3 0.1 <NA>
hrbrmstr
  • 77,368
  • 11
  • 139
  • 205