aggregate and reshape dataframe in r

Question

I want to take a long dataframe and make it wide. Here is an example:

df <- data.frame(date = rep(seq(as.Date("1990/1/1"), as.Date("1999/1/1"), "years"),10), price = seq.int(1, 100), type = c(rep("str",10), rep("str2",10), rep("chr",10), rep("chr2",10), rep("num",10), rep("num2",10), rep("posix",10), rep("posix2",10), rep("date",10), rep("date2",10)))

I want each column to be a different date, but as you can see each of the type column has its own set of dates from 1990 to 1999. I just want one column for each unique date and then one row for every type. Then the entries in the [i,j]th cell would be the price of that type on that day of the year.

So I can imagine it looking like:

type,1990-1-1, 1991-1-1, ..., 1999-1-1

num, 1, 2, ... , 10

chr, 11, 12, ..., 20

...

date, 91, 92, ..., 100

score 1 · Accepted Answer · edited Jan 13 '18 at 12:18

library(dplyr)
library(tidyr)

df%>%
  arrange(date, type)%>%
  group_by(date, type)%>%
  slice(1)%>%
  spread(date, price)

# A tibble: 5 x 11
# Groups: type [5]
  type   `1990-01-01` `1991-01-01` `1992-01-01` `1993-01-01` `1994-01-01` `1995-01-01` `1996-01-01` `1997-01-01` `1998-01-01` `1999-01-01`
* <fctr>        <int>        <int>        <int>        <int>        <int>        <int>        <int>        <int>        <int>        <int>
1 chr              21           22           23           24           25           26           27           28           29           30
2 date             81           82           83           84           85           86           87           88           89           90
3 num              41           42           43           44           45           46           47           48           49           50
4 posix            61           62           63           64           65           66           67           68           69           70
5 str               1            2            3            4            5            6            7            8            9           10

Right now, Nicolas is producing duplicates in every single row and column. You'll have to remove them as you can't store vectors (as specified in the error in Nicolas's answer) in a spot where a single numeric value is needed.

I'll mark this correct, but if you want you should check out another question I just posted which extends the same problem to my actual problem. https://stackoverflow.com/questions/48235088/collapse-dataframe-in-r-by-factor-with-nas — conv3d, Jan 12 '18 at 22:52

score 0 · Answer 2 · edited Jan 15 '18 at 07:54

0

Easy to do with tidyr:

library(tidyr)
spread(df, key = date, value = price)

     type 1990-01-01 1991-01-01 1992-01-01 1993-01-01 1994-01-01 1995-01-01 1996-01-01 1997-01-01
1     chr         21         22         23         24         25         26         27         28
2    chr2         31         32         33         34         35         36         37         38
3    date         81         82         83         84         85         86         87         88
4   date2         91         92         93         94         95         96         97         98
5     num         41         42         43         44         45         46         47         48
6    num2         51         52         53         54         55         56         57         58
7   posix         61         62         63         64         65         66         67         68

edited Jan 15 '18 at 07:54

Axeman

32,068
8
81
94

answered Jan 12 '18 at 21:34

Nicolás Velasquez

5,623
11
22

2

when i run this code, i get the error `Error: Duplicate identifiers for rows (21, 31), (81, 91), (41, 51), (61, 71), (1, 11), (22, 32), (82, 92), (42, 52), (62, 72), (2, 12), (23, 33), (83, 93), (43, 53), (63, 73), (3, 13), (24, 34), (84, 94), (44, 54), (64, 74), (4, 14), (25, 35), (85, 95), (45, 55), (65, 75), (5, 15), (26, 36), (86, 96), (46, 56), (66, 76), (6, 16), (27, 37), (87, 97), (47, 57), (67, 77), (7, 17), (28, 38), (88, 98), (48, 58), (68, 78), (8, 18), (29, 39), (89, 99), (49, 59), (69, 79), (9, 19), (30, 40), (90, 100), (50, 60), (70, 80), (10, 20)` – InfiniteFlash Jan 12 '18 at 21:36
I get the same as @InfiniteFlashChess – conv3d Jan 12 '18 at 21:43
4

Two points: (1) spread is from the "tidyr" package (not "tidyverse", as incorrectly mentioned in [this other answer]() or "dplyr" as in your answer. (2) Duplicate identifiers usually mean that if you run `duplicated` on the final row and to-be-column variables, you'll find duplicated values. @InfiniteFlashChess took just the first value with `slice`. Another option might be to add a secondary ID, with something like `df %>% group_by(date, type) %>% mutate(id = sequence(n())) %>% unite(key, type, id) %>% spread(date, price)`. – A5C1D2H2I1M1N2O1R2T1 Jan 13 '18 at 09:49

aggregate and reshape dataframe in r

2 Answers2