4

I have the following example set of data:

Example<-data.frame(A=10*1:9,B=10*10:18)

rownames(Example)<-paste("Sample",1:9)
> Example
          A   B
Sample 1 10 100
Sample 2 20 110
Sample 3 30 120
Sample 4 40 130
Sample 5 50 140
Sample 6 60 150
Sample 7 70 160
Sample 8 80 170
Sample 9 90 180

I am trying to divide each element in both columns by its column's total. I have tried a variety of methods, but I feel like I am missing a fundamental piece of code that would make this easier. I have gotten this far:

ExampleSum1 <- sum(Example[,1])
ExampleSum2 <- sum(Example[,2])

But I don't know how to divide 10, 20, 30, etc by ExampleSum1, etc.

989
  • 12,579
  • 5
  • 31
  • 53
Adam
  • 313
  • 1
  • 3
  • 12

5 Answers5

6

data.table solution:

sum.cols = c("A", "B")
library(data.table)
setDT(Example, keep.rownames = TRUE)
Example[ , (sum.cols) := lapply(.SD, function(x) x/sum(x)), .SDcols = sum.cols]

Or perhaps more direct in your case:

Example[ , c("A", "B") := .(A/sum(A), B/sum(B))]

Which give:

Example
#          rn          A          B
# 1: Sample 1 0.02222222 0.07936508
# 2: Sample 2 0.04444444 0.08730159
# 3: Sample 3 0.06666667 0.09523810
# 4: Sample 4 0.08888889 0.10317460
# 5: Sample 5 0.11111111 0.11111111
# 6: Sample 6 0.13333333 0.11904762
# 7: Sample 7 0.15555556 0.12698413
# 8: Sample 8 0.17777778 0.13492063
# 9: Sample 9 0.20000000 0.14285714

The main appeal of this approach as opposed to one using colSums or sweep is that both of these require converting your data to a matrix and then back, which may be costly. It depends on your use case; if your table is small, these other approaches are fine and it depends on what you find most readable.

I also notice that no other answers mention the mapply approach, which would work in almost any paradigm; here's the data.table approach:

Example[ , (sum.cols) := mapply(`/`, .SD, lapply(.SD, sum), SIMPLIFY = FALSE), 
        .SDcols = sum.cols]
MichaelChirico
  • 33,841
  • 14
  • 113
  • 198
4

You can get column sums with colSums and paste to make new column names derived from the previous. colSums returns a vector of the column sums, but to do column-wise division you need to use a little trickery. The best way looks to be the one mentioned @user20650.

## Make new columns: proportions of column sums
dat[,paste(names(dat),"prop", sep="_")] <- t( t(dat) / colSums(dat) )

dat
#          A   B     A_prop     B_prop
# Sample1 10 100 0.02222222 0.07936508
# Sample2 20 110 0.04444444 0.08730159
# Sample3 30 120 0.06666667 0.09523810
# Sample4 40 130 0.08888889 0.10317460
# Sample5 50 140 0.11111111 0.11111111
# Sample6 60 150 0.13333333 0.11904762
# Sample7 70 160 0.15555556 0.12698413
# Sample8 80 170 0.17777778 0.13492063
# Sample9 90 180 0.20000000 0.14285714

Data

dat <- read.table(text="A      B
Sample1    10     100
Sample2    20     110
Sample3    30     120
Sample4    40     130
Sample5    50     140
Sample6    60     150
Sample7    70     160
Sample8    80     170
Sample9    90     180", header=T)
Rorschach
  • 31,301
  • 5
  • 78
  • 129
4

What about just a apply:

 apply(dat, 2, function(x) x / sum(x))
                 A          B
Sample1 0.02222222 0.07936508
Sample2 0.04444444 0.08730159
Sample3 0.06666667 0.09523810
Sample4 0.08888889 0.10317460
Sample5 0.11111111 0.11111111
Sample6 0.13333333 0.11904762
Sample7 0.15555556 0.12698413
Sample8 0.17777778 0.13492063
Sample9 0.20000000 0.14285714
SabDeM
  • 7,050
  • 2
  • 25
  • 38
2

Is this what you're after?

id <- paste("sample", c(1:9))

A <- seq(10, 90, 10)
B <- seq(100, 180, 10)

Example <- data.frame(id, A, B)

Example$A2 <- with(Example, A/sum(A))
Example$B2 <- with(Example, B/sum(B))

Note: new columns A2 and B2.

    id    A   B         A2         B2
 sample 1 10 100 0.02222222 0.07936508
 sample 2 20 110 0.04444444 0.08730159  
 sample 3 30 120 0.06666667 0.09523810
 sample 4 40 130 0.08888889 0.10317460
 sample 5 50 140 0.11111111 0.11111111
 sample 6 60 150 0.13333333 0.11904762
 sample 7 70 160 0.15555556 0.12698413
 sample 8 80 170 0.17777778 0.13492063
 sample 9 90 180 0.20000000 0.14285714
John_dydx
  • 951
  • 1
  • 14
  • 27
1

You could simply do:

library(dplyr)
dat %>% mutate_each(funs(. / sum(.)))

Which gives:

#           A          B
#1 0.02222222 0.07936508
#2 0.04444444 0.08730159
#3 0.06666667 0.09523810
#4 0.08888889 0.10317460
#5 0.11111111 0.11111111
#6 0.13333333 0.11904762
#7 0.15555556 0.12698413
#8 0.17777778 0.13492063
#9 0.20000000 0.14285714

If you want to keep rownames, do:

dat %>% add_rownames("rn") %>% mutate_each(funs(. / sum(.)), -rn) 
Steven Beaupré
  • 21,343
  • 7
  • 57
  • 77