101

I have a data frame with several columns; some numeric and some character. How to compute the sum of a specific column? I’ve googled for this and I see numerous functions (sum, cumsum, rowsum, rowSums, colSums, aggregate, apply) but I can’t make sense of it all.

For example suppose I have a data frame people with the following columns

people <- read.table(
  text = 
    "Name Height Weight
    Mary 65     110
    John 70     200
    Jane 64     115", 
  header = TRUE
)
…

How do I get the sum of all the weights?

malexan
  • 140
  • 1
  • 5
User
  • 62,498
  • 72
  • 186
  • 247

5 Answers5

129

You can just use sum(people$Weight).

sum sums up a vector, and people$Weight retrieves the weight column from your data frame.

Note - you can get built-in help by using ?sum, ?colSums, etc. (by the way, colSums will give you the sum for each column).

mathematical.coffee
  • 55,977
  • 11
  • 154
  • 194
  • 2
    when I do this I get: `[1] NA`. I looked at the data for this column and the very last row has NA, is that why? – User Mar 12 '12 at 23:27
  • 11
    Yep, that's why. You can ignore the NAs if you want via `sum(people$Weight,na.rm=TRUE)` (you can read about this option in `?sum`). – mathematical.coffee Mar 12 '12 at 23:28
11

To sum values in data.frame you first need to extract them as a vector.

There are several way to do it:

# $ operatior
x <- people$Weight
x
# [1] 65 70 64

Or using [, ] similar to matrix:

x <- people[, 'Weight']
x
# [1] 65 70 64

Once you have the vector you can use any vector-to-scalar function to aggregate the result:

sum(people[, 'Weight'])
# [1] 199

If you have NA values in your data, you should specify na.rm parameter:

sum(people[, 'Weight'], na.rm = TRUE)
Bulat
  • 6,869
  • 1
  • 29
  • 52
3

to order after the colsum :

order(colSums(people),decreasing=TRUE)

if more than 20+ columns

order(colSums(people[,c(5:25)],decreasing=TRUE) ##in case of keeping the first 4 columns remaining.
ParkerHalo
  • 4,341
  • 9
  • 29
  • 51
sai saran
  • 737
  • 9
  • 32
3

you can use tidyverse package to solve it and it would look like the following (which is more readable for me):

library(tidyverse)  
people %>%
summarise(sum(weight, na.rm = TRUE))
Birasafab
  • 152
  • 11
2

When you have 'NA' values in the column, then

sum(as.numeric(JuneData1$Account.Balance), na.rm = TRUE)
Dheeraj Inampudi
  • 1,227
  • 15
  • 11