0

I have a dataframe with Transaction ID and Product Name as columns. I'm trying to create a 3rd column which gives me the count of Transaction ID's. The final dataframe should look as shown below.

    TID       Product        Orders         
    100       iPhone           2  
    100       Samsung          2  
    101       Lenovo           3  
    101       iPad             3  
    101       Galaxy           3  
    102       iPhone           1  
    103       HTC              1  

I tried using the length function, but that gives me the length of the entire column but not individual TIDs.

df$Orders <- length(df$Tid)  

I also tried using the sqldf function as shown. But that gives only distinct values of TID.

test <- sqldf("Select TID, count(TID) as Orders, Product from df Group By TID")
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
red
  • 53
  • 5

3 Answers3

3

We can use one of the aggregate by group functions. Using dplyr, we group by the 'TID' column, create a new column 'Orders' as the number of observations within each group (n()) using mutate

library(dplyr)
df1 %>%
  group_by(TID)%>%
  mutate(Orders=n())
#    TID Product Orders
#1 100  iPhone      2
#2 100 Samsung      2
#3 101  Lenovo      3
#4 101    iPad      3
#5 101  Galaxy      3
#6 102  iPhone      1
#7 103     HTC      1

Or using data.table, we convert the 'data.frame' to 'data.table' (setDT(df1)). Grouped by 'Product', we create a new column ('Orders') as the number of observations within each group (.N).

library(data.table)
setDT(df1)[, Orders:=.N, by=Product] 

Or an option with sqldf, where we left join the original dataset with the modified dataset.

library(sqldf)
sqldf('Select * from df1
       left join(select TID, 
        count(TID) as Orders 
        from df1
        group by TID) 
        using (TID)')
    using (TID)')
#  TID Product Orders
#1 100  iPhone      2
#2 100 Samsung      2
#3 101  Lenovo      3
#4 101    iPad      3
#5 101  Galaxy      3
#6 102  iPhone      1
#7 103     HTC      1

data

df1 <- structure(list(TID = c(100L, 100L, 101L, 101L, 101L, 102L, 103L
), Product = c("iPhone", "Samsung", "Lenovo", "iPad", "Galaxy", 
"iPhone", "HTC")), .Names = c("TID", "Product"), row.names = c(NA, 
-7L), class = "data.frame")
akrun
  • 874,273
  • 37
  • 540
  • 662
1

Base package:

df1$count <- ave(df1$TID, df1$TID, FUN=length)

Output:

  TID Product count
1 100  iPhone     2
2 100 Samsung     2
3 101  Lenovo     3
4 101    iPad     3
5 101  Galaxy     3
6 102  iPhone     1
7 103     HTC     1
mpalanco
  • 12,960
  • 2
  • 59
  • 67
0

You could use data.table package:

library(data.table)
setDT(df)
df[, .(Orders = .N), by = Product]
Andriy T.
  • 2,020
  • 12
  • 23
  • 1
    @DavidArenburg you're right, it does nothing, gives the same result as `.(Orders = .N)` – Andriy T. Aug 12 '15 at 11:54
  • 1
    @akrun, Yeah, looking for a fast solution I've missunderstand the desired output, also because of missing clear input. In this case the correct answer using `data.table` is your's, with a little edit: `by = TID` – Andriy T. Aug 12 '15 at 12:58