Summing up values in one column based on unique values in another column

Question

I am trying to add the values in column C based on unique values in column B.For instance,for B = 1,I would like to add all rows in column C i.e. 5+4+3=12.

A B C
1 1 5
2 1 4 
3 1 3
4 2 1
5 2 3

for(i in unique(df$B)){
  df$D = sum(df$C)  
}

Also, I would like to add the number of times each data in column B occurs.

Solution :

example from my task :

  docIdx newsgroup_ID  freq  
       1            1   768 
       2            1   125  
       3            1    29 
       4            1    51  
       5            1   198 
       6            1    34 
       7            1    64 
       8            2    35
       9            2    70
       10           2    45

Can you please include your expected output? This sounds like a job for `aggregate` or `group_by`/`summarise`. — Maurits Evers, Feb 18 '19 at 02:49

Maurits Evers · Accepted Answer · 2019-02-18T05:32:03.553

1

In base R you could use ave

df[, c("D", "E")] <- with(df, sapply(c(sum, length), function(x) ave(C, B, FUN = x)))
df
#  A B C  D E
#1 1 1 5 12 3
#2 2 1 4 12 3
#3 3 1 3 12 3
#4 4 2 1  4 2
#5 5 2 3  4 2

Or using dplyr

library(dplyr)
df <- df %>%
    group_by(B) %>%
    mutate(D = sum(C), E = length(C))
df
## A tibble: 5 x 5
## Groups:   B [2]
#      A     B     C     D     E
#  <int> <int> <int> <int> <int>
#1     1     1     5    12     3
#2     2     1     4    12     3
#3     3     1     3    12     3
#4     4     2     1     4     2
#5     5     2     3     4     2

Sample data

df <- read.table(text =
    "A B C
1 1 5
2 1 4
3 1 3
4 2 1
5 2 3", header = T)

It works just fine with your revised data

df <- read.table(text =
    "docIdx newsgroup_ID  freq
       1            1   768
       2            1   125
       3            1    29
       4            1    51
       5            1   198
       6            1    34
       7            1    64
       8            2    35
       9            2    70
       10           2    45", header = T)


df[, c("sum.freq", "length.freq")] <- with(df, sapply(c(sum, length), function(x) 
    ave(freq, newsgroup_ID, FUN = x)))
#   docIdx newsgroup_ID freq sum.freq length.freq
#1       1            1  768     1269           7
#2       2            1  125     1269           7
#3       3            1   29     1269           7
#4       4            1   51     1269           7
#5       5            1  198     1269           7
#6       6            1   34     1269           7
#7       7            1   64     1269           7
#8       8            2   35      150           3
#9       9            2   70      150           3
#10     10            2   45      150           3

Here ave(freq, newsgroup_ID, FUN = x) applies function x to freq by newsgroup_ID.

edited Feb 18 '19 at 05:32

answered Feb 18 '19 at 02:56

Maurits Evers

49,617
4
47
68

Thank you @maurits but It doesnt get included in the dataframe or get saved.How to do that ? – shome Feb 18 '19 at 03:12
The base R solution adds two new columns; so `df` will contain the new columns; for the `dplyr` solution just store in a new `data.frame`, i.e. for example prefix with `df.new <- ...` – Maurits Evers Feb 18 '19 at 03:42
Sorry,But I am not sure how to save it.The solution is coming as you suggested,but I still couldnt save it. – shome Feb 18 '19 at 04:08
There is nothing to save. As I said, in the base R solution `df` will contain the new columns `"D"` and `"E"`; as for the `dplyr` solution, I've made an edit; `df <- df %>% ...` will overwrite `df` and add the two new columns. – Maurits Evers Feb 18 '19 at 04:11
I am not getting the answer with my dataset, although it is working with the example you have provided.for example : docIdx newsgroup_ID freq total number 1 1 1 768 844174 7504 2 2 1 125 844174 7504 3 3 1 29 844174 7504 4 4 1 51 844174 7504 5 5 1 198 844174 7504 6 6 1 34 844174 7504 7 7 1 64 844174 7504 .. it is summing for entire column irrespective of different value in newsgroup_ID – shome Feb 18 '19 at 05:20
Please add minimal & representative sample dataset in your original post; it's difficult to read code in comments. – Maurits Evers Feb 18 '19 at 05:23
I just included the representative sample dataset. – shome Feb 18 '19 at 05:27

score 0 · Answer 2 · answered Feb 18 '19 at 03:01

0

B <- c(1,1,1,2,2)
C <- c(5,4,3,1,3)
x <- cbind(B,C)

sum <- 0
for (i in 1:nrow(x)) {
  if (x[i] == 1) {
    sum <-  x[i, 2] + sum 
  }
  sum
}

I hope this will help you.

answered Feb 18 '19 at 03:01

Roland Araza

3
3

It is showing error ; Error in `[.data.frame`(class_prior_newsgroup, i) : undefined columns selected In addition: Warning messages: 1: In if (class_prior_newsgroup[i] == 1) { : the condition has length > 1 and only the first element will be used 2: In if (class_prior_newsgroup[i] == 1) { : the condition has length > 1 and only the first element will be used 3: In if (class_prior_newsgroup[i] == 1) { : the condition has length > 1 and only the first element will be used – shome Feb 18 '19 at 03:17

score 0 · Answer 3 · answered Feb 18 '19 at 04:17

If you want to do the same logic using looping condition

for (i in unique (df$B)){

  xx <- sum(df$C[df$B==i])

  yy <- length(df$C[df$B==i])

  df$D[df$B==i] <- xx

  df$E[df$B==i] <- yy
}
print(df)
  A B C  D E
1 1 1 5 12 3
2 2 1 4 12 3
3 3 1 3 12 3
4 4 2 1  4 2
5 5 2 3  4 2

score 0 · Answer 4 · answered Feb 18 '19 at 05:35

B <- c(1,1,1,2,2)
C <- c(5,4,3,1,3)
x <- cbind(B,C)


holder1 <- c()
holder2 <- c()
for (num in unique(x[,1])) {
  sum <- 0
  count <- 0
  for (i in 1:nrow(x)) {
    if (x[i] == num) {
      sum <-  x[i, 2] + sum
      count <- 1 + count
    }
  }
  print(count)
  holder1 <- c(holder1, rep(count, count))
  holder2 <- c(holder2, rep(sum, count))
}
x <- as.data.frame(x)
x <- add_column(x, E = holder1, .after = "C")
x <- add_column(x, D = holder2, .after = "C")


> x
  B C  D E
1 1 5 12 3
2 1 4 12 3
3 1 3 12 3
4 2 1  4 2
5 2 3  4 2

Note: Make sure we have the same variables. (understand the code) I don't know high level functions, that's why I used the basic.

Summing up values in one column based on unique values in another column

4 Answers4

Sample data