1

Suppose I am using panel data: for each individual and time, there is an observation of a numerical variable. I want to apply a function to this numerical variable but this function outputs a vector of numbers. I'd like to apply this function over the observations of each individual and store the resulting vector as columns of a new dataframe.


Example:

TICKER OFTIC CNAME            ANNDATS_ACT ACTUAL
   <chr>  <chr> <chr>            <date>       <dbl>
 1 0001   EPE   EP ENGR CORP     2019-05-08   -0.15
 2 0004   ACSF  AMERICAN CAPITAL 2014-08-04    0.29
 3 000R   CRCM  CARECOM          2018-02-27    0.32
 4 000V   EIGR  EIGER            2018-05-11   -0.84
 5 000Y   RARE  ULTRAGENYX       2016-02-25   -1.42
 6 000Z   BIOC  BIOCEPT          2018-03-28  -54   
 7 0018   EGLT  EGALET           2016-03-08   -0.28
 8 001A   SESN  SESEN BIO        2021-03-15   -0.11
 9 001C   ARGS  ARGOS            2017-03-16   -7   
10 001J   KN    KNOWLES          2021-02-04    0.38

For each TICKER, I will consider the time-series implied by ACTUAL and compute the autocorrelation function. I defined the following wrapper to perform the operation:

my_acf <- function(x, lag = NULL){
  acf_vec <- acf(x, lag.max = lag, plot = FALSE, na.action = na.contiguous)$acf
  acf_vec <- as.vector(acf_vec)[-1]
  return(acf_vec)
}

If the desired maximum lag is, say, 3, I'd like to create another dataset in which I have 4 columns: TICKER and the correspoding 3 first autocorrelations of the associated series of ACTUAL observations.

My solution was:

max_lag = 3
autocorrs <- final_sample %>% 
  group_by(TICKER) %>%
  filter(!all(is.na(ACTUAL))) %>%
  summarise(rho = my_acf(ACTUAL, lag = max_lag)) %>%
  mutate(order = row_number()) %>%
  pivot_wider(id_cols = TICKER, values_from = rho, names_from = order, names_prefix = "rho_")

This indeed provides the desired output:

TICKER rho_1   rho_2   rho_3
   <chr>  <dbl>   <dbl>   <dbl>
 1 0001   0.836  0.676   0.493 
 2 0004   0.469 -0.224  -0.366 
 3 000R   0.561  0.579   0.327 
 4 000V   0.634  0.626   0.604 
 5 000Y   0.370  0.396   0.117 
 6 000Z   0.476  0.454   0.382 
 7 0018   0.382 -0.0170 -0.278 
 8 001A   0.330  0.316   0.0944
 9 001C   0.727  0.590   0.400 
10 001J   0.281 -0.308  -0.0343

My question is how can one perform this operation without a pivot_wider and the manual creation of the order column? The summarise verb creates a single column that store the autocorrelations sequentially for each TICKER. Is there a way to force summarize to create different columns for the different output a given function may provide when applied to, let's say, the ACTUAL series?

Raul Guarini Riva
  • 651
  • 1
  • 10
  • 20

0 Answers0