0

Using dplyr/accross is very easy to summarize numeric variables which group by character variables (no need input variable names). Across can use to data.table but the speed will slow (just as manipulate dataframe ) .

Is the any way for not lost speed and convenience ? Thanks!

library(data.table)
library(dplyr)
test_data <- data.table(x=c('a','b','c','a','b','c'),y=c('d','e','f','d','e','f'),a=c(1:6),b=c(1,7,NA,3,5,6),c=c(NA,3,NA,4,7,8))
test_data %>% group_by(across(where(is.character))) %>% 
  summarise(across(where(is.number),function(x) sum(x,na.rm=TRUE)))
M--
  • 25,431
  • 8
  • 61
  • 93
anderwyang
  • 1,801
  • 4
  • 18
  • 2
    (1) You're using `dplyr` verbs on a `data.table`. Unless you're also loading the `dtplyr`, it will not do things in a canonical/fast `data.table`-manner, so it will be "normal dplyr fast" (not data.table-fast). (2) `is.number` is not found, are you loading any other packages? (3) What makes you think this is slow? A `data.table`-canonical method could be `test_data[, lapply(.SD, sum, na.rm = TRUE), by = c(names(test_data)[sapply(test_data, is.character)])]`. – r2evans Sep 14 '22 at 12:15
  • thank for your replay , I will try to load paclage dtplyr meanwhile – anderwyang Sep 14 '22 at 12:25
  • `is.number` supposed to be `is.numeric`? If not, what package is it from? – s_baldur Sep 14 '22 at 12:50
  • it's is.numeric , sorry for wrong input – anderwyang Sep 14 '22 at 13:02

1 Answers1

4

Working directly with data.table:

char_columns <- sapply(test_data, is.character) |> which() |> names()

test_data[, lapply(.SD, sum, na.rm = TRUE), .SDcols = is.numeric, by = char_columns]

#    x y a  b  c
# 1: a d 5  4  4
# 2: b e 7 12 10
# 3: c f 9  6  8
s_baldur
  • 29,441
  • 4
  • 36
  • 69