How do I use condition with rows in a dataframe (weighted average, R language)?

Question

I need to calculate the weighted average of each row in the dataframe, where: Does anyone know how to do it using the R language? regards

t1 <- c(1, 2, 4, 6, 7, 9)
t2 <- c(6, 6, 5, 3, 3, 7)
df <- data.frame(t1 = t1, t2=t2, stringsAsFactors = FALSE)

if value <= 5 , weight is 1
if value > 5 and <= 8 , weight is 2
if value > 8 , weight is 3

www · Answer 1 · 2019-03-03T16:33:18.790

2

A solution using tidyverse to calculate the weighted mean for each row.

library(tidyverse)

df2 <- df %>%
  # Add row numbers
  rowid_to_column() %>%
  # Convert to long format
  gather(Group, Value, -rowid) %>%
  # Assign weight
  mutate(Weight = case_when(
    Value <= 5                   ~1,
    Value >  5 & Value <= 8      ~2,
    Value > 8                    ~3,
    TRUE                         ~NA_real_
  )) %>%
  # Calculated weighted average
  group_by(rowid) %>%
  summarize(Weighted_Mean = weighted.mean(Value, Weight)) %>%
  ungroup()
df2
#   rowid Weighted_Mean
#   <int>         <dbl>
# 1     1          4.33
# 2     2          4.67
# 3     3          4.5 
# 4     4          5   
# 5     5          5.67
# 6     6          8.2

edited Mar 03 '19 at 16:33

answered Mar 03 '19 at 16:24

www

38,575
12
48
84

1

The OP wants means *by row*, not by column. – Rui Barradas Mar 03 '19 at 16:28
@RuiBarradas Thanks. I have updated my answer. But why do we have different results? Perhaps I misunderstand the meaning of weighted average? – www Mar 03 '19 at 16:35
`weighted.mean` normalizes the weights to sum to `1`. I will update my answer. – Rui Barradas Mar 03 '19 at 17:38

Rui Barradas · Answer 2 · 2019-03-04T14:05:54.647

A base R solution, function findInterval does the main part. Then multiply the result with df and get the row means.

t1 <- c(1, 2, 4, 6, 7, 9)
t2 <- c(6, 6, 5, 3, 3, 7)
df <- data.frame(t1 = t1, t2=t2, stringsAsFactors = FALSE)

cp <- c(-Inf, 5, 8, Inf)

Edit.

If the weights are normalized to sum to 1, then the right answer will be any of the following two.

wt <- sapply(df, findInterval, cp)
rowSums(df*(wt/apply(wt, 1, sum)))
#[1] 4.333333 4.666667 4.666667 5.000000 5.666667 8.200000

sapply(1:nrow(df), function(i)
  weighted.mean(df[i,], sapply(df, findInterval, cp)[i,]))
#[1] 4.333333 4.666667 4.666667 5.000000 5.666667 8.200000

These results are now equal to the results in the other answers.

arg0naut91 · Answer 3 · 2019-03-03T18:08:21.373

2

If you'd like to multiply by weights & then divide by their sum (equal to weighted.mean function in R):

df %>%
  mutate_at(vars(t1, t2),
            list(weights = ~ case_when(. <= 5 ~ 1,
                                       . > 5 & . <= 8 ~ 2,
                                       TRUE ~ 3))) %>%
  mutate(rowMeanWeighted = rowSums(.[, 1:2] * .[, 3:4]) / rowSums(.[, 3:4])) %>%
  select(-contains("weights"))

Output:

  t1 t2 rowMeanWeighted
1  1  6        4.333333
2  2  6        4.666667
3  4  5        4.500000
4  6  3        5.000000
5  7  3        5.666667
6  9  7        8.200000

edited Mar 03 '19 at 18:08

answered Mar 03 '19 at 16:36

arg0naut91

14,574
2
17
38

Different than my results but the same as Rui Barradas. What is the definition of a weighted average? – www Mar 03 '19 at 16:38
@www, I take it as multiplication by a weight given the description above. Not sure if the right approach though, let's see what the OP says. – arg0naut91 Mar 03 '19 at 16:42
2

I did not see multiplication by a weight mentioned in the original post, but let’s wait for clarification. – www Mar 03 '19 at 16:51
@www, tbh after another thought even if it would be explicitly mentioned a result that doesn't go between minimum and maximum of the vector wouldn't make much sense, have deleted my first answer – arg0naut91 Mar 03 '19 at 18:10

How do I use condition with rows in a dataframe (weighted average, R language)?

3 Answers3