I'm using RSQlite to import Datasets from an SQlite-Database. There are multiple millions of observations within the Database. Therefor I'd like to do as much as possible of Data selection and aggregation within the Database.
At some point I need to aggregate a character variable. I want to get the value which occures the most ordered by a group. How can I edit the following dplyr-chain so it works also with RSQlite?
library(tidyverse)
library(RSQLite)
# Path to Database
DATABASE="./xxx.db"
# Connect Database
mydb <- dbConnect(RSQLite::SQLite(), DATABASE)
# Load Database
data = tbl(mydb, "BigData")
# Query Database
Summary <- data %>%
filter(year==2020) %>%
group_by(Grouping_variable) %>%
summarize(count=n(),
Item_variable=names(which.max(table(Item_variable))))
Within R that code would do it's job. Querying the database I get an error code Error: near "(": syntax error
Original pipe contains more filters and steps.
Example Database would basically look like:
data.frame(Grouping_variable=c("A","A","B","C","C","C","D","D","D","D"),
year=c(2019,2020,2019,2020,2020,2020,2020,2020,2020,2021),
Item_variable=c("X","Y","Y","X","X","Y","Y","Y","X","X"))
Grouping_variable year Item_Variable
1 A 2019 X
2 A 2020 Y
3 B 2019 Y
4 C 2020 X
5 C 2020 X
6 C 2020 Y
7 D 2020 Y
8 D 2020 Y
9 D 2020 X
10 D 2021 X
Result should look like:
Grouping_variable count Item_variable
<chr> <int> <chr>
1 A 1 Y
2 C 3 X
3 D 3 Y