1

Sorry for asking what might be a very basic question, but I am stuck in a conundrum and cannot seem to get out of it.

I have a code that looks like

Medicine  Biology  Business sex weights
0           1          0     1     0.5
0           0          1     0     1
1           0          0     1     05
0           1          0     0     0.33
0           0          1     0     0.33
1           0          0     1     1 
0           1          0     0     0.33
0           0          1     1     1
1           0          0     1     1

Where the first three are fields of study, and the fouth variable regards gender. Obviously with many more observations. What I want to get, is the mean level of the the field of study (medicine, biology, business) by the variable sex (so the mean for men and the mean for women). To do so, I have used the following code:

barplot_sex<-aggregate(x=df_dummies[,1:19] , by=list(df$sex),
                            FUN= function(x) mean(x)

Which works perfectly and gives me what I needed. My problem is that I need to use a weighted mean now, but I canno use

FUN= function(x) weighted.mean(x, weights)

as there are many more observations than fields of study.

The only alternative I managed to do was to edit(boxplot) and change the values manually, but then R doesn't save the changes. Plus, I am sure there must be a trivial way to do exactly what I need.

Any help would be greatly appreciated.

Bests, Gabriele

Gabriele
  • 13
  • 3
  • What are the weights you'd like to use? What is your expected outcome? on a side note, you could shorten your current function to `aggregate(df_dummies, list(df$sex), mean) ` or `aggregate(.~sex, df_dummies, mean)` with the same results – Daniel O Jul 09 '20 at 11:37
  • The weights are for repeated observations, so are non integers, as some students study more than a single field. The expected result would be similar to what I get now, just with the weighted average rather than the simple average. Thank you for the info on how to shorten the code btw – Gabriele Jul 09 '20 at 11:43

1 Answers1

0

Using by.

by(dat, dat$sex, function(x) sapply(x[, 1:3], weighted.mean, x[, "weights"]))
# dat$sex: 0
# Medicine   Biology  Business 
# 0.0000000 0.3316583 0.6683417 
# --------------------------------------------------------------------------------------- 
# dat$sex: 1
# Medicine    Biology   Business 
# 0.82352941 0.05882353 0.11764706 

Data:

dat <- structure(list(Medicine = c(0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 1L
), Biology = c(1L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L), Business = c(0L, 
1L, 0L, 0L, 1L, 0L, 0L, 1L, 0L), sex = c(1L, 0L, 1L, 0L, 0L, 
1L, 0L, 1L, 1L), weights = c(0.5, 1, 5, 0.33, 0.33, 1, 0.33, 
1, 1)), class = "data.frame", row.names = c(NA, -9L))
jay.sf
  • 60,139
  • 8
  • 53
  • 110
  • I'm confused, did you make up weights? or am I missing something? – Daniel O Jul 09 '20 at 11:53
  • @DanielO I invented some because I thought they were missing :) Ah, did you intend to weight by sex?? – jay.sf Jul 09 '20 at 11:56
  • I think the example is unrepresentative.I believe OP has situations where a row has `1` in multiple subjects and wants to adjust the weights such that `all(rowSums(dat[,1:3])==1)` remains `TRUE`. That's what I got out of OPs answer to my comment at least. It's still unclear. – Daniel O Jul 09 '20 at 12:01
  • 1
    So then @jay.sf has a correct answer. It can also be shortened (at the slight expese of readability) with `by(dat, dat$sex, function(x) sapply(x[, 1:3], weighted.mean, x[, 5]))` – Daniel O Jul 09 '20 at 12:06
  • 1
    I have modified the post to show how the weights look like. What I need is basically the weighted average of business for men, and the weighted average of business for women. I could do this using ~lm and adding the weights there, but it seems like I'm doing non unrequired extra steps for something that should be trivial! – Gabriele Jul 09 '20 at 12:07
  • @Gabriele Great, adapted my answer to your weights! – jay.sf Jul 09 '20 at 12:11
  • 1
    @DanielO Thx, applied to answer. – jay.sf Jul 09 '20 at 12:12