1

I am currently interpolating a time-series and need to use the approx function in a dataframe with 4 columns and 172660 rows, but 4 groups (so its 43165 rows for each group). Currently, there's two answers about this: using summarise, but with the interpolation in just one column; and one using a datatable. The first approach indeed works, but not for my purpose. I also noted that using mutate_at, for example, is superseeded by mutate(across()). So I was trying to use a more up-to-date approach, but it's not working.

library(tidyverse)
tabela_1 <- tibble(x1 = rnorm(4800, mean = 88.5, sd = 4),
                   x2 = rnorm(4800, mean = -38.526, sd = 2.758),
                   x3 = rnorm(4800, mean = -22.6852, sd = 1.8652),
                   x4 = rnorm(4800, mean = -38.526, sd = 2.758),
                   tmpts = rep(x = seq(from = 0, to = 863.28, by = 0.72), 
                               times = 4),
                   category = rep(x = 1:4, each = 1200))
tabela <- tibble(tmpts = rep(x = seq(from = 0, to = 863.28, by = 0.02), 
                             times = 4),
                 category = rep(x = 1:4, each = 43165))
        
tabela_joined <- tabela %>% 
            left_join(tabela_1, by = c("tmpts", "category")) %>% 
            arrange(category, tmpts) %>% 
            janitor::clean_names()
        
tabela_interpolation <- tabela_joined %>% 
            group_by(category) %>%
            summarize(across(.cols = x1:x4, approx(., n = 43165)))

When running tabela_interpolation, I receive:

Erro: Problem with `summarise()` input `..1`.
i `..1 = across(.cols = x1:x15, approx(., n = 43165))`.
x Can't convert an integer vector to function
i The error occurred in group 1: run = 1.
Run `rlang::last_error()` to see where the error occurred.
Além disso: Warning message:
In regularize.values(x, y, ties, missing(ties), na.rm = na.rm) :
  collapsing to unique 'x' values

How should I use summarise plus across to get the interpolated time-series from approx function in each column in the dataframe?

Juliana
  • 93
  • 1
  • 10

1 Answers1

3

You can use the across syntax as -

library(tidyverse)

tabela_joined %>% 
  group_by(category) %>%
  summarize(across(x1:x4, approx, n = 43165)) %>%
  ungroup

Or

tabela_joined %>% 
  group_by(category) %>%
  summarize(across(x1:x4, ~approx(., n = 43165))) %>%
  ungroup

This can be followed by unnest to get the complete expanded dataframe.

tabela_joined %>% 
  group_by(category) %>%
  summarize(across(x1:x4, approx, n = 43165)) %>%
  ungroup %>%
  unnest(x1:x4)

#   category    x1    x2    x3    x4
#      <int> <dbl> <dbl> <dbl> <dbl>
# 1        1     1     1     1     1
# 2        1     2     2     2     2
# 3        1     3     3     3     3
# 4        1     4     4     4     4
# 5        1     5     5     5     5
# 6        1     6     6     6     6
# 7        1     7     7     7     7
# 8        1     8     8     8     8
# 9        1     9     9     9     9
#10        1    10    10    10    10
# … with 345,310 more rows
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Thank you so much for your answer. I will test this in a minute and give the feedback! Just making sure I understood the code: when using a function inside across, I should not put the arguments inside parenthesis, unless using a tilda before the function? – Juliana Jul 19 '21 at 03:42
  • 2
    `across` accepts additional arguments to the function using `...` so you can do `across(x1:x4, approx, n = 43165)`. If you are going to call the function as `fun()` in which case you need the tilde (`~`) before the function. – Ronak Shah Jul 19 '21 at 03:54
  • 1
    It worked exacly as I wanted. I just wanted to add something: since ```aprox``` function's output is a ```list``` object, just the ```across(x1:x4, ~aprox(., n = 43165))``` worked because I had to specify that I wanted just the y element of the list like this: ```summarize(across(.cols = x1:x4, ~approx(., n = 43165)$y))```. Thank you so much for explaining. – Juliana Jul 19 '21 at 04:25