1

In R, I have a time series ts_big in long format as shown below, with observations of type A and B:

ts1<-tibble(dates=c("2023-01-01","2023-02-01","2023-03-01",
                    "2023-04-01"), 
            numbers_1=c(1.0, 2.8, 2.9, 2.0), 
            numbers_2=c(3.0, 5.0, 7.9, 0.9),
            types=replicate(4, "A"))

ts2<-tibble(dates=c("2023-01-01","2023-02-01"), 
            numbers_1=c(0.3, 0.9),
            numbers_2=c(3.0, 5.0),
            types=replicate(2, "B"))

ts_big<-rbind(ts1, ts2)

Type A has observations at 4 times ("2023-01-01","2023-02-01","2023-03-01"), while type B has observations at only 2 of those same times ("2023-01-01","2023-02-01"). Is there a way to transform this tibble ts_big into a wide format with 5 columns: one single date column one column for type A numbers_1, one column for type A numbers_2, and one column for type B numbers_1, and one column for type B numbers_2 (perhaps with NA values in rows where type B doesn't have observation values for given dates)?

I understand this could be done manually by creating a bunch of vectors then compiling them into a tibble/dataframe but I was wondering if there was a function or simple process that could be used on much larger/complex examples than the small scale one I provided here.

Note: This is an reposted version of a previous post which was I improperly phrased.

James Rider
  • 633
  • 1
  • 9

3 Answers3

1

Double-pivot:

library(dplyr) # mostly just for %>%, though it's very useful with tidyr
library(tidyr)
ts_big %>%
  pivot_longer(cols = -c(dates, types)) %>%
  pivot_wider(id_cols = dates, names_from = c(types, name), values_from = value)
# # A tibble: 4 × 5
#   dates      A_numbers_1 A_numbers_2 B_numbers_1 B_numbers_2
#   <chr>            <dbl>       <dbl>       <dbl>       <dbl>
# 1 2023-01-01         1           3           0.3           3
# 2 2023-02-01         2.8         5           0.9           5
# 3 2023-03-01         2.9         7.9        NA            NA
# 4 2023-04-01         2           0.9        NA            NA
r2evans
  • 141,215
  • 6
  • 77
  • 149
  • thanks! Just one question: What is the id_cols=dates doing in the final line? – James Rider Aug 30 '23 at 02:25
  • `id_cols=` identifies the columns that are to be kept as-is, no pivoting. The other two arguments list columns names and values. Any column not included in those will be dropped. – r2evans Aug 30 '23 at 10:46
1

Here is another option without the need to pivot longer:

library(tidyr)
pivot_wider(ts_big, values_from = starts_with("numbers"), 
            names_from = "types", 
            names_glue = "{types}_{.value}" )


# A tibble: 4 × 5
  dates      A_numbers_1 B_numbers_1 A_numbers_2 B_numbers_2
  <chr>            <dbl>       <dbl>       <dbl>       <dbl>
1 2023-01-01         1           0.3         3             3
2 2023-02-01         2.8         0.9         5             5
3 2023-03-01         2.9        NA           7.9          NA
4 2023-04-01         2          NA           0.9          NA
Dave2e
  • 22,192
  • 18
  • 42
  • 50
1

read.zoo with the split= argument can do that. The fortify.zoo line converts the zoo output to data.frame and can be omitted if a zoo object is ok as the result. The setNames line is to make the column names a bit nicer and can be omitted if names of the form numbers_1.A are ok.

library(dplyr, exclude = c("filter", "lag"))
library(zoo)

ts_big %>%
  read.zoo(split = "types") %>%
  fortify.zoo(name = "date") %>%
  setNames(sub("numbers_(.*)\\.(.*)", "\\2_\\1", names(.)))
##         date A_1 A_2 B_1 B_2
## 1 2023-01-01 1.0 3.0 0.3   3
## 2 2023-02-01 2.8 5.0 0.9   5
## 3 2023-03-01 2.9 7.9  NA  NA
## 4 2023-04-01 2.0 0.9  NA  NA
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341