4

I have two data frames in R as below...I need to add new column(count_orders) in df1 which contains count of orders in df2 (or count of buyer in df2). Please help.

> df1
  buyer city
1     A   xx
2     B   yy
3     C   zz
> df2
  order buyer item
1     1     A    1
2     2     A    2
3     3     B    1
4     4     A    2
5     5     B    1
6     6     C    3
7     7     C    4

Expected output:

> df1
  buyer city count_orders
1     A   xx   3
2     B   yy   2
3     C   zz   2
Ravi Kumar
  • 161
  • 1
  • 6

3 Answers3

3

Here's a possible data.table solution that performs a binary join between df1 and df2 while calculating length while joining using by = .EACHI

library(data.table)  
setkey(setDT(df2), buyer)  
df2[df1, list(city, count_orders = .N), by = .EACHI]
#    buyer city count_orders
# 1:     A   xx            3
# 2:     B   yy            2
# 3:     C   zz            2

Alternative approach (modification of @nicolas comment) could be (which will update df1 by reference)

library(data.table)  
setkey(setDT(df1), buyer)  
df1[setDT(df2)[, .N, keyby = buyer], count_orders := i.N]
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
2

You can try:

df1$count_orders<-as.vector(table(df2$buyer)[as.character(df1$buyer)])
#  buyer city count_orders
#1     A   xx            3
#2     B   yy            2
#3     C   zz            2
nicola
  • 24,005
  • 3
  • 35
  • 56
  • 1
    That's a very nice/efficient solution. works well even if some levels aren't present in `df2` – David Arenburg May 31 '15 at 09:55
  • Tx David, much appreciated! – nicola May 31 '15 at 09:55
  • Or maybe (not tested) `setkey(dt1,buyer);dt1[setkey(dt2[,list(count_orders=.N),by=buyer],buyer)]` where `dt1` and `dt2` are the `data.table` version of `df1` and `df2`. – nicola May 31 '15 at 10:11
  • @DavidArenburg I guess you could post your comment as answer to show a `data.table` solution and an `.EACHI` usage, which may be helpful in general. – nicola May 31 '15 at 10:20
  • Ok, btw, the problem with your solution is that it will return a vector, if only one group within `df1` matched to `df2`, while `data.table` simply doesn't have a `drop` argument – David Arenburg May 31 '15 at 10:22
  • Ok, I've added a modified version of your suggestion while adding `:= i.N` and now it works correctly. – David Arenburg May 31 '15 at 10:49
1

Here's a dplyr approach:

library(dplyr)
count(df2, buyer) %>% right_join(df1, "buyer")
#Source: local data frame [3 x 3]
#
#  buyer n city
#1     A 3   xx
#2     B 2   yy
#3     C 2   zz

You could use count(df2, buyer) %>% right_join(df1) and let dplyr figure out the column to join by on its own ("buyer" in this case).

talat
  • 68,970
  • 21
  • 126
  • 157