Retrieve all rows with same minimum value for a column with sqldf

Question

I have to retrieve IDs for employees who have completed the minimum number of jobs. There are multiple employees who have completed 1 job. My current sqldf query retrieves only 1 row of data, while there are multiple employee IDs who have completed just 1 job. Why does it stop at the first minimum value? And how do I fetch all rows with the minimum value in a column? Here is a data sample:

ID  TaskCOunt
1    74
2    53
3    10
4     5
5     1
6     1
7     1

The code I have used:

sqldf("select id, min(taskcount) as Jobscompleted
       from (select id,count(id) as taskcount 
            from MyData
            where id is not null 
            group by id order by id)")

Output is

ID   leastcount
5     1

While what I want is all the rows with minimum jobs completed.

ID  Jobscompleted
5     1
6     1 
7     1

G. Grothendieck · Accepted Answer · 2017-11-05T02:42:46.190

2

min(...) always returns one row in SQL as do all SQL aggregate functions. Try this instead:

sqldf("select ID, TaskCount TasksCompleted from MyData 
       where TaskCount = (select min(TaskCount) from MyData)")

giving:

   ID TasksCompleted
1  5              1
2  6              1
3  7              1

Note: The input in reproducible form is:

Lines <- "
ID  TaskCount
1    74
2    53
3    10
4     5
5     1
6     1
7     1"
MyData <- read.table(text = Lines, header = TRUE)

edited Nov 05 '17 at 02:42

answered Nov 05 '17 at 02:03

G. Grothendieck

254,981
17
203
341

... and changing the column name would be done with `sqldf("SELECT ID, TaskCount AS JobsCompleted FROM MyData WHERE TaskCount = (SELECT MIN(TaskCount) FROM MyData)")` – vaettchen Nov 05 '17 at 02:10
This works. Thank you! But can you explain to me why? My query only retrieved one row, I believe I am missing an important subquery concept. – pyeR_biz Nov 05 '17 at 02:13
As stated in the answer `min` always returns a single row. – G. Grothendieck Nov 05 '17 at 02:16
@G.Grothendieck So when you use - where taskcount = (select min(..) how is it different from just select min (..) – pyeR_biz Nov 05 '17 at 02:45
Yes, that subselect always returns one row. – G. Grothendieck Nov 05 '17 at 02:46
@G.Grothendieck can you tell me what kind subquery is your statement? So I can study on it. As in single row, multiple row, correlated subquery? – pyeR_biz Nov 05 '17 at 03:05
Try googling for `sql subquery`. Also `sql aggregate`. Or try `sqlite` or `h2` in place of `sql`. – G. Grothendieck Nov 05 '17 at 11:41

score 0 · Answer 2 · answered Nov 05 '17 at 02:24

0

As an alternative to sqldf, you could use data.table:

library(data.table)
dt <- data.table(ID=1:7, TaskCount=c(74, 53, 10, 5, 1, 1, 1))

dt[TaskCount==min(TaskCount)]

##    ID TaskCount
## 1:  5         1
## 2:  6         1
## 3:  7         1

answered Nov 05 '17 at 02:24

dnlbrky

9,396
2
51
64

Retrieve all rows with same minimum value for a column with sqldf

2 Answers2