Suppose I am using panel data: for each individual and time, there is an observation of a numerical variable. I want to apply a function to this numerical variable but this function outputs a vector of numbers. I'd like to apply this function over the observations of each individual and store the resulting vector as columns of a new dataframe.
Example:
TICKER OFTIC CNAME ANNDATS_ACT ACTUAL
<chr> <chr> <chr> <date> <dbl>
1 0001 EPE EP ENGR CORP 2019-05-08 -0.15
2 0004 ACSF AMERICAN CAPITAL 2014-08-04 0.29
3 000R CRCM CARECOM 2018-02-27 0.32
4 000V EIGR EIGER 2018-05-11 -0.84
5 000Y RARE ULTRAGENYX 2016-02-25 -1.42
6 000Z BIOC BIOCEPT 2018-03-28 -54
7 0018 EGLT EGALET 2016-03-08 -0.28
8 001A SESN SESEN BIO 2021-03-15 -0.11
9 001C ARGS ARGOS 2017-03-16 -7
10 001J KN KNOWLES 2021-02-04 0.38
For each TICKER
, I will consider the time-series implied by ACTUAL
and compute the autocorrelation function. I defined the following wrapper to perform the operation:
my_acf <- function(x, lag = NULL){
acf_vec <- acf(x, lag.max = lag, plot = FALSE, na.action = na.contiguous)$acf
acf_vec <- as.vector(acf_vec)[-1]
return(acf_vec)
}
If the desired maximum lag is, say, 3, I'd like to create another dataset in which I have 4 columns: TICKER
and the correspoding 3 first autocorrelations of the associated series of ACTUAL
observations.
My solution was:
max_lag = 3
autocorrs <- final_sample %>%
group_by(TICKER) %>%
filter(!all(is.na(ACTUAL))) %>%
summarise(rho = my_acf(ACTUAL, lag = max_lag)) %>%
mutate(order = row_number()) %>%
pivot_wider(id_cols = TICKER, values_from = rho, names_from = order, names_prefix = "rho_")
This indeed provides the desired output:
TICKER rho_1 rho_2 rho_3
<chr> <dbl> <dbl> <dbl>
1 0001 0.836 0.676 0.493
2 0004 0.469 -0.224 -0.366
3 000R 0.561 0.579 0.327
4 000V 0.634 0.626 0.604
5 000Y 0.370 0.396 0.117
6 000Z 0.476 0.454 0.382
7 0018 0.382 -0.0170 -0.278
8 001A 0.330 0.316 0.0944
9 001C 0.727 0.590 0.400
10 001J 0.281 -0.308 -0.0343
My question is how can one perform this operation without a pivot_wider and the manual creation of the order
column? The summarise
verb creates a single column that store the autocorrelations sequentially for each TICKER
. Is there a way to force summarize to create different columns for the different output a given function may provide when applied to, let's say, the ACTUAL
series?