2

I have to retrieve IDs for employees who have completed the minimum number of jobs. There are multiple employees who have completed 1 job. My current sqldf query retrieves only 1 row of data, while there are multiple employee IDs who have completed just 1 job. Why does it stop at the first minimum value? And how do I fetch all rows with the minimum value in a column? Here is a data sample:

ID  TaskCOunt
1    74
2    53
3    10
4     5
5     1
6     1
7     1

The code I have used:

sqldf("select id, min(taskcount) as Jobscompleted
       from (select id,count(id) as taskcount 
            from MyData
            where id is not null 
            group by id order by id)")

Output is

ID   leastcount
5     1

While what I want is all the rows with minimum jobs completed.

ID  Jobscompleted
5     1
6     1 
7     1
pyeR_biz
  • 986
  • 12
  • 36

2 Answers2

2

min(...) always returns one row in SQL as do all SQL aggregate functions. Try this instead:

sqldf("select ID, TaskCount TasksCompleted from MyData 
       where TaskCount = (select min(TaskCount) from MyData)")

giving:

   ID TasksCompleted
1  5              1
2  6              1
3  7              1

Note: The input in reproducible form is:

Lines <- "
ID  TaskCount
1    74
2    53
3    10
4     5
5     1
6     1
7     1"
MyData <- read.table(text = Lines, header = TRUE)
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
0

As an alternative to sqldf, you could use data.table:

library(data.table)
dt <- data.table(ID=1:7, TaskCount=c(74, 53, 10, 5, 1, 1, 1))

dt[TaskCount==min(TaskCount)]

##    ID TaskCount
## 1:  5         1
## 2:  6         1
## 3:  7         1
dnlbrky
  • 9,396
  • 2
  • 51
  • 64