Count of a value in a new column in R

Question

I have a dataframe with Transaction ID and Product Name as columns. I'm trying to create a 3rd column which gives me the count of Transaction ID's. The final dataframe should look as shown below.

    TID       Product        Orders         
    100       iPhone           2  
    100       Samsung          2  
    101       Lenovo           3  
    101       iPad             3  
    101       Galaxy           3  
    102       iPhone           1  
    103       HTC              1

I tried using the length function, but that gives me the length of the entire column but not individual TIDs.

df$Orders <- length(df$Tid)

I also tried using the sqldf function as shown. But that gives only distinct values of TID.

test <- sqldf("Select TID, count(TID) as Orders, Product from df Group By TID")

The result can be easily obtained using excel COUNTIF but it takes ages for the query to run as the dataframe contains more than 100,000 rows. — red, Aug 12 '15 at 11:44
A vectorised base R approach could be `res <- table(df$TID); df$orders <- res[match(df$TID, names(res))]` — David Arenburg, Aug 12 '15 at 11:49

akrun · Accepted Answer · 2015-08-12T11:58:26.973

We can use one of the aggregate by group functions. Using dplyr, we group by the 'TID' column, create a new column 'Orders' as the number of observations within each group (n()) using mutate

library(dplyr)
df1 %>%
  group_by(TID)%>%
  mutate(Orders=n())
#    TID Product Orders
#1 100  iPhone      2
#2 100 Samsung      2
#3 101  Lenovo      3
#4 101    iPad      3
#5 101  Galaxy      3
#6 102  iPhone      1
#7 103     HTC      1

Or using data.table, we convert the 'data.frame' to 'data.table' (setDT(df1)). Grouped by 'Product', we create a new column ('Orders') as the number of observations within each group (.N).

library(data.table)
setDT(df1)[, Orders:=.N, by=Product]

Or an option with sqldf, where we left join the original dataset with the modified dataset.

library(sqldf)
sqldf('Select * from df1
       left join(select TID, 
        count(TID) as Orders 
        from df1
        group by TID) 
        using (TID)')
    using (TID)')
#  TID Product Orders
#1 100  iPhone      2
#2 100 Samsung      2
#3 101  Lenovo      3
#4 101    iPad      3
#5 101  Galaxy      3
#6 102  iPhone      1
#7 103     HTC      1

data

df1 <- structure(list(TID = c(100L, 100L, 101L, 101L, 101L, 102L, 103L
), Product = c("iPhone", "Samsung", "Lenovo", "iPad", "Galaxy", 
"iPhone", "HTC")), .Names = c("TID", "Product"), row.names = c(NA, 
-7L), class = "data.frame")

sorry @akrun. I might have clicked on the tick by mistake. – red Aug 13 '15 at 08:42 — red, Aug 13 '15 at 08:42
@red No problem. I also guessed so. – akrun Aug 13 '15 at 08:43 — akrun, Aug 13 '15 at 08:43

score 1 · Answer 2 · answered Aug 12 '15 at 11:55

1

Base package:

df1$count <- ave(df1$TID, df1$TID, FUN=length)

Output:

  TID Product count
1 100  iPhone     2
2 100 Samsung     2
3 101  Lenovo     3
4 101    iPad     3
5 101  Galaxy     3
6 102  iPhone     1
7 103     HTC     1

answered Aug 12 '15 at 11:55

mpalanco

12,960
2
59
67

Andriy T. · Answer 3 · 2015-08-12T11:57:08.377

0

You could use data.table package:

library(data.table)
setDT(df)
df[, .(Orders = .N), by = Product]

edited Aug 12 '15 at 11:57

answered Aug 12 '15 at 11:43

Andriy T.

2,020
12
23

1

@DavidArenburg you're right, it does nothing, gives the same result as `.(Orders = .N)` – Andriy T. Aug 12 '15 at 11:54
1

@akrun, Yeah, looking for a fast solution I've missunderstand the desired output, also because of missing clear input. In this case the correct answer using `data.table` is your's, with a little edit: `by = TID` – Andriy T. Aug 12 '15 at 12:58

Count of a value in a new column in R

3 Answers3

data