1

I am trying to use the collect() function from dplyr to query a table (comp_fleet) in SQLite. The following code works:

Cposns <- tbl(db_con,"comp_fleet") %>% 
  arrange(mmsi, desc(timestamp))%>% 
  dplyr::filter(!is.na(lat)) %>% collect()

However when I try top_n(), I get an error:

Cposns <- tbl(db_con,"comp_fleet") %>% 
  arrange(mmsi, desc(timestamp))%>% 
  dplyr::filter(!is.na(lat)) %>% 
  top_n(1,timestamp) %>% collect()

Error: Window function `rank()` is not supported by this database

Since this is a huge table, I was hoping to carry out all aggregation functions within the database before collecting. Is there some way of using the top_n() function prior to collect()? Or something equivalent to it?

Dhiraj
  • 1,650
  • 1
  • 18
  • 44
  • 3
    Have you tried to use `head()` instead of `top_n()`?. Your dataframe is not grouped, so you probably don't need `top_n()`. – Scarabee Jul 10 '18 at 11:55
  • @Scarabee thanks for your comment, just realized that I did need a `group_by()` – Dhiraj Jul 10 '18 at 12:04

1 Answers1

1

Thanks to @Scarabee, this is what works:

Cposns <- tbl(db_con,"comp_fleet") %>%
  arrange(mmsi, desc(timestamp))%>%
  dplyr::filter(!is.na(lat)) %>% group_by(mmsi) %>%
  do(head(., n = 1)) %>% collect()
Dhiraj
  • 1,650
  • 1
  • 18
  • 44