0

I am new to SQL and am using MonetDB to load a large file into R studio. I have loaded my data as a db using monetDB, and would now like to execute the R code below on the data in this database:

my_selection <- db_data %>% group_by(id) %>% 
  tally(. , sort = TRUE) %>% 
  top_n(100) %>% 
  select(id) 

Basically, I want to group my data by "id", tally and sort it, and select the 100 largest elements in it. What would be an equivalent of this in SQL?

I am executing queries in the following way in R:

my_selection <- dbGetQuery(connection,"SELECT * FROM my_table [INSERT REST OF CODE HERE]") 

1 Answers1

0

That's depends on the DBMS you're using ,

SQL-Server :

SELECT TOP 100 id,sum(YourOtherColumn) as sum_c
FROM YourTable
GROUP BY id
ORDER BY sum_c DESC

MySQL :

SELECT id,sum(YourOtherColumn) as sum_c
FROM YourTable
GROUP BY id
ORDER BY sum_c DESC
LIMIT 100

If it's something else, tell me and I'll edit the answer.

sagi
  • 40,026
  • 6
  • 59
  • 84
  • I am using MySQL, in the following way in R: my_selection <- dbGetQuery(connection,"SELECT * FROM my_table [INSERT REST OF CODE HERE]") –  Jun 28 '16 at 10:53
  • Thanks! I have 40 other columns. How would this translate into sum(YourOtherColumn)? Also, pardon my ignorance but what do you mean by sum_c ? –  Jun 28 '16 at 10:57
  • `sum_c` is an alias, a nickname for the column. I don't know if you need sum or not, did you intend to do it on `R` ? if not, just write the `SELECT ID,column1,column2,column3...` and `ORDER BY ` @wsp1morlet – sagi Jun 28 '16 at 10:59