0

Title may be a little confusing so I will try to explain better here.

Let's say I have a data frame:

> df = data.frame(a=c(8,6,4,2),b=c(9,7,4,3),c=c(10,6,3,3),d=c(8,6,3,2))
> df
  a b  c d
1 8 9 10 8
2 6 7  6 6
3 4 4  3 3
4 2 3  3 2

My desired output would be:

> dfDesired = data.frame(a=c(8,6,4,2),b=c(0.33,0.37,0.4,0.38),c=c(0.37,0.32,0.3,0.38)
+                        ,d=c(0.3,0.32,0.3,0.25))
> dfDesired
  a    b    c    d
1 8 0.33 0.37 0.30
2 6 0.37 0.32 0.32
3 4 0.40 0.30 0.30
4 2 0.38 0.38 0.25

First, I only want computations done on specific columns, in this case, columns b,c,d. Second, I want to sum the values in the row at the specified columns. So for row 1, 9+10+8=27. Then, I want to find the ratio of each cell with respect to the row sum. So, again for row 1, 9/27=0.33, 10/27=0.37,8/27=0.3, etc. for the other rows.

How can this be accomplished in R?

Logica
  • 977
  • 4
  • 16
cap
  • 337
  • 3
  • 14

3 Answers3

2

We can use prop.table with margin = 1 to calculate row-wise proportions.

cbind(df[1], prop.table(as.matrix(df[-1]), 1))

#  a     b     c     d
#1 8 0.333 0.370 0.296
#2 6 0.368 0.316 0.316
#3 4 0.400 0.300 0.300
#4 2 0.375 0.375 0.250

To make the selection of columns more explicit

cols <- c("b", "c", "d")
cbind(df[setdiff(names(df), cols)], prop.table(as.matrix(df[cols]), 1))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Thank you for the response. What if I want to calculate on specific columns though? Let's say I have an arbitrary number of columns, how can I apply this to specific columns? – cap Dec 03 '19 at 07:30
  • @cap I have updated the answer, so that this would be applied only to `cols` columns. – Ronak Shah Dec 03 '19 at 07:33
0

We can get the rowSums of column 'b', 'c', 'd', and then use that to divide the columns

dfnew <- df
dfnew[-1] <- round(df[-1]/rowSums(df[-1]), 2)
dfnew
#  a    b    c    d
#1 8 0.33 0.37 0.30
#2 6 0.37 0.32 0.32
#3 4 0.40 0.30 0.30
#4 2 0.38 0.38 0.25

rowSums is generalized as well


Or using tidyverse

library(purrr)
library(dplyr) 
library(magrittr)
df %>% 
  select(-a) %>%
  reduce(`+`) %>% 
  divide_by(df[-1]) %>%
   bind_cols(df['a'], .)
akrun
  • 874,273
  • 37
  • 540
  • 662
0

A more generalisable approach is using apply():

df[-1] <- t(apply(df[-1], 1, function(x) x / sum(x)))
df
  a         b         c         d
1 8 0.3333333 0.3703704 0.2962963
2 6 0.3684211 0.3157895 0.3157895
3 4 0.4000000 0.3000000 0.3000000
4 2 0.3750000 0.3750000 0.2500000
s_baldur
  • 29,441
  • 4
  • 36
  • 69