4

I have a dataset like this:

DEPTNO ENAME
   10 CLARK
   10 KING
   10 MILLER
   20 ADAMS
   20 FORD
   20 JONES

And I am trying to create a results like this:

DEPTNO AGGREGATED_ENAMES
   10 CLARK,KING,MILLER
   20 ADAMS,FORD,JONES

In Oracle this can be done by:

SQL> SELECT deptno
    ,LISTAGG(ename, ',') WITHIN GROUP (ORDER BY ename) AS employees
   FROM   emp
   GROUP  BY
   deptno;

How can I do this in SQLDF in R?

Or if it is not possible in R SQLDF, how can I do it in R?

Thanks! Parth

Henrik
  • 65,555
  • 14
  • 143
  • 159
Parth Tiwari
  • 855
  • 2
  • 9
  • 23

2 Answers2

9

Use group_concat like this:

sqldf("select DEPTNO, group_concat(ENAME) ENAMES from emp group by DEPTNO")

giving:

  DEPTNO            ENAMES
1     10 CLARK,KING,MILLER
2     20  ADAMS,FORD,JONES
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
4

First, load your data into a data.frame and set stringsAsFactors to FALSE.

> v1=c(10, 10,20)
> v2=c('CLARK','KING','ADAMS')

> df = data.frame(v1,v2, stringsAsFactors = FALSE)
df
  v1    v2
1 10 CLARK
2 10  KING
3 20 ADAMS

> str(df)
'data.frame':   3 obs. of  2 variables:
 $ v1: num  10 10 20
 $ v2: chr  "CLARK" "KING" "ADAMS"

Next, use the aggregate function between the two columns:

> aggregate(v2 ~ v1, df, c)
  v1          v2
1 10 CLARK, KING
2 20       ADAMS
Myles Baker
  • 3,600
  • 2
  • 19
  • 25
  • I vaguely remember that by using `c` as aggregation function, your v2 column in the result will contain a list - if I remember correctly. In case they want a single string, paste - collapse or toString might be useful as function. – talat Jan 21 '15 at 19:56
  • ok! so this did the trick, >aggregate(cbind(v2,v3,v4)~v1,df,FUN=function(x)paste(x,collapse=";")) @docendodiscimus, you are right, c gives a list. It would still help if there is a way to do this in SQLDF – Parth Tiwari Jan 21 '15 at 20:06
  • 3
    @ParthTiwari, thanks for confirming that. Your function looks good, and you could use `toString` (without the need for an anonymous function) to collapse with ", " or, you could also use `aggregate(v2 ~ v1, df, paste, collapse = ", ")` (because aggregate has an `...` parameter you can pass to the function. Also, if you had only columns v1..v4 in your data.frame you could modify your aggregate to `aggregate(.~v1, df, paste, collapse=";")` - just fyi. – talat Jan 21 '15 at 20:12