0

I have a data frame that looks like this:

query        page           clicks
apples       /apples        50
oranges      /oranges       25
apples       /bad-apples    40
bananas      /bananas       25
apples       /all-fruits    10

I need this, where a new column, querySum, is created that sums up all of the clicks for a particular query. So, in the example above, the row for apples should always show 100 (50+40+10).

query        page           clicks   querySum
apples       /apples        50       100
oranges      /oranges       25       25
apples       /bad-apples    40       100
bananas      /bananas       25       25
apples       /all-fruits    10       100

I tried dplyr, but it created a smaller table with just unique values. Is there a way to apply this to every row in a new column, perhaps using a loop?

df$querySum <- df %>% group_by(query) %>% summarise(querySum = sum(clicks))
neilfws
  • 32,751
  • 5
  • 50
  • 63
Jeff Swanson
  • 45
  • 1
  • 8

1 Answers1

1

Try the following:

library(tidyverse);
df %>% group_by(query) %>% mutate(querySum = sum(clicks))
## A tibble: 5 x 4
## Groups:   query [3]
#  query   page        clicks querySum
#  <fct>   <fct>        <int>    <int>
#1 apples  /apples         50      100
#2 oranges /oranges        25       25
#3 apples  /bad-apples     40      100
#4 bananas /bananas        25       25
#5 apples  /all-fruits     10      100

Or in base R you can use ave:

df$querySum = ave(df$clicks, list(df$query), FUN = sum);
df;
#    query        page clicks querySum
#1  apples     /apples     50      100
#2 oranges    /oranges     25       25
#3  apples /bad-apples     40      100
#4 bananas    /bananas     25       25
#5  apples /all-fruits     10      100    

Sample data

df <- read.table(text =
    "query        page           clicks
apples       /apples        50
oranges      /oranges       25
apples       /bad-apples    40
bananas      /bananas       25
apples       /all-fruits    10", header = T)
Maurits Evers
  • 49,617
  • 4
  • 47
  • 68
  • I would use `ungroup()` at the end, i.e.: `df %>% group_by(query) %>% mutate(querySum = sum(clicks)) %>% ungroup()` to revert back to a non-grouped df. – byouness Apr 30 '18 at 20:02
  • @uness Sure. Depending on further data manipulation steps, `ungroup`ing may be useful or even necessary. In OPs case, `ungroup` makes no difference, and it becomes more of a conceptual & code design question; i.e. should `group_by` always be followed up with `ungroup`. In cases such as this, I opt for conciseness. – Maurits Evers Apr 30 '18 at 21:56