0

I have a large data set that I need to reshape/melt. The problem that I have is that the first and second row are variable names (i.e., the first row is the id of a person and the second row lists four attributes for this person). In addition to this the first column captures the dates when the values were collected. To see a sample of my data set please take a look at Sheet1 in the following google sheet (https://docs.google.com/spreadsheets/d/19b_4hB6aM9JXReG67i9EF_sJVDHkNFHf4iShH8yrfOc/edit?usp=sharing). I would like to reshape my data to look like the Sheet2 in the above google sheet.

Is there a way of specifying that there are two IDs in my melt command. The first id is the first row (starting at the second column) and the second id is the first column (starting at the second row). If I could write a pseudo-R script for what I am trying to do I would write something like this.

melt(dt, id=c("Dates from A2:A6", "Person from B1:I1")

Thanks!

TCS
  • 127
  • 1
  • 11

3 Answers3

1

The following approach will do your job,

library(tidyverse)
name <- readr::read_csv("csvs1.csv", col_names = F, n_max = 2)

name
# A tibble: 2 x 9
  X1    X2       X3       X4       X5       X6       X7       X8       X9      
  <chr> <chr>    <chr>    <chr>    <chr>    <chr>    <chr>    <chr>    <chr>   
1 NA    person_A person_A person_A person_A person_B person_B person_B person_B
2 Dates var1     var2     var3     var4     var1     var2     var3     var4    

nm <- paste(names[1,], names[2,], sep = "-")
nm
[1] "NA-Dates"      "person_A-var1" "person_A-var2" "person_A-var3" "person_A-var4" "person_B-var1" "person_B-var2"
[8] "person_B-var3" "person_B-var4"

data <- readr::read_csv("csvs1.csv", col_names = F, skip = 2)

names(data) <- nm
data
# A tibble: 4 x 9
  `NA-Dates` `person_A-var1` `person_A-var2` `person_A-var3` `person_A-var4` `person_B-var1` `person_B-var2`
  <date>               <dbl>           <dbl>           <dbl>           <dbl>           <dbl>           <dbl>
1 2021-05-01            45.0            43.9            67.9            35.7            40.5            38.6
2 2021-05-02            50.4            52.6            67.9            41.9            41.7            43.9
3 2021-05-03            53.7            49.1            67.9            49.6            41.3            38.6
4 2021-05-04            53.7            52.6            73.2            45.7            29.8            33.3
# ... with 2 more variables: person_B-var3 <dbl>, person_B-var4 <dbl>

data %>% 
+   pivot_longer(!`NA-Dates`, names_to = c('person', '.value'), names_sep = '-', names_prefix = 'person_')

# A tibble: 8 x 6
  `NA-Dates` person  var1  var2  var3  var4
  <date>     <chr>  <dbl> <dbl> <dbl> <dbl>
1 2021-05-01 A       45.0  43.9  67.9  35.7
2 2021-05-01 B       40.5  38.6  60.7  32.6
3 2021-05-02 A       50.4  52.6  67.9  41.9
4 2021-05-02 B       41.7  43.9  42.9  40.3
5 2021-05-03 A       53.7  49.1  67.9  49.6
6 2021-05-03 B       41.3  38.6  48.2  39.5
7 2021-05-04 A       53.7  52.6  73.2  45.7
8 2021-05-04 B       29.8  33.3  57.1  16.3
AnilGoyal
  • 25,297
  • 4
  • 27
  • 45
  • Thanks! Can you help me understand why you need to negate the date variable. I just want to understand why you need the ! in front of the `NA-Dates` – TCS May 14 '21 at 18:21
  • Because all columns except `NA-dates` were to be pivoted. So instead of writing names of all columns to be pivoted I simply used names of columns not to be pivoted. – AnilGoyal May 14 '21 at 18:26
0

Here is a data.table approach using readr::type_convert for convenience:

df <- structure(list(
    person_A = c("var1", "45.0413", "50.4132", "53.719", "53.719"),
    person_A = c("var2", "43.8596", "52.6316", "49.1228", "52.6316"),
    person_A = c("var3", "67.8571", "67.8571", "67.8571", "73.2143"),
    person_A = c("var4", "35.6589", "41.8605", "49.6124", "45.7364"),
    person_B = c("var1", "40.4959", "41.7355", "41.3223", "29.7521"),
    person_B = c("var2", "38.5965", "43.8596", "38.5965", "33.3333"),
    person_B = c("var3", "60.7143", "42.8571", "48.2143", "57.1429"),
    person_B = c("var4", "32.5581", "40.3101", "39.5349", "16.2791")),
    class = "data.frame",
    row.names = c("Dates", "2021-05-01", "2021-05-02", "2021-05-03", "2021-05-04")
    )

library(data.table)

# split data.frame by person
dfl <- split.default(df, sub('\\d+', '', names(df)))

# re-define column labels and types based on first row
dfl <- lapply(dfl, function(x) {
    setnames(readr::type_convert(data.table(x[-1,], 
    keep.rownames = TRUE)), 
    as.character(data.table(x[1,], keep.rownames = TRUE)))})

# combine list elements and reorder columns
setcolorder(rbindlist(dfl, idcol = "Person"), c(2,1))[]
#>         Dates   Person    var1    var2    var3    var4
#> 1: 2021-05-01 person_A 45.0413 43.8596 67.8571 35.6589
#> 2: 2021-05-02 person_A 50.4132 52.6316 67.8571 41.8605
#> 3: 2021-05-03 person_A 53.7190 49.1228 67.8571 49.6124
#> 4: 2021-05-04 person_A 53.7190 52.6316 73.2143 45.7364
#> 5: 2021-05-01 person_B 40.4959 38.5965 60.7143 32.5581
#> 6: 2021-05-02 person_B 41.7355 43.8596 42.8571 40.3101
#> 7: 2021-05-03 person_B 41.3223 38.5965 48.2143 39.5349
#> 8: 2021-05-04 person_B 29.7521 33.3333 57.1429 16.2791

Created on 2021-05-05 by the reprex package (v2.0.0)

user12728748
  • 8,106
  • 2
  • 9
  • 14
0

In order to use the base reshape it's important to combine levels in one string, separate them with a point and rename your variables.

Using the data.frame from above.

# Get variables names
varA <- as.character(df[1,])
varB <- names(df)
newNames <- paste(varA, varB, sep = '.')  # don't forget the point

# Drop first line, rename variables and pass rownames to a column
df <- df[-1,] 
names(df) <- newNames
df$dates <- row.names(df)    # added after last column

# reshape
lastVar <- ncol(df)-1        # because of rownames in last column
df <- reshape(df, varying = 1:lastVar, direction = 'long')

# and beautify
row.names(df) <- NULL
names(df)[names(df) == 'time'] <- 'person'
df <- df[, -ncol(df)]
df

>        dates   person    var1    var2    var3    var4
> 1 2021-05-01 person_A 45.0413 43.8596 67.8571 35.6589
> 2 2021-05-02 person_A 50.4132 52.6316 67.8571 41.8605
> 3 2021-05-03 person_A  53.719 49.1228 67.8571 49.6124
> 4 2021-05-04 person_A  53.719 52.6316 73.2143 45.7364
> 5 2021-05-01 person_B 40.4959 38.5965 60.7143 32.5581
> 6 2021-05-02 person_B 41.7355 43.8596 42.8571 40.3101
> 7 2021-05-03 person_B 41.3223 38.5965 48.2143 39.5349
> 8 2021-05-04 person_B 29.7521 33.3333 57.1429 16.2791

olitroski
  • 46
  • 3