How to do a weighted T-test in R?

Question

I have df1:

PopDens     Score1   Group
93.53455  17.985288   B
137.13861 10.549394   A
35.98619  13.392857   A
89.69800   8.644537   B
16.27796  29.591635   A
25.33346  21.081301   F
89.69800   2.644537   C
46.27796  29.591635   A
25.33346   5.081301   B
36.27796  29.591635   A
 1.33346   9.081301   B

I would like to perform a t-test between groups A and B looked at the difference in mean of score1.

However, I want to weight the analysis so that rows with a larger PopDens have a stronger weight in the analysis. For example, I don't want the final row to have as much weight in the analysis as the second row because the population densities are very different.

How is this done?

StupidWolf · Answer 1 · 2020-04-02T18:18:59.993

Below is more like a small summary of my thoughts and quick search. I have never used a weighted t.test before, only weights in linear regression.

There is no clear definition for what would make a weighted t-test. The issue lies with how to use weights in estimating the error because that is the basis of your t-test. You can check out this discussion and maybe this paper on weights in linear regression.

So your data:

df = structure(list(PopDens = c(93.53455, 137.13861, 35.98619, 89.698, 
16.27796, 25.33346, 89.698, 46.27796, 25.33346, 36.27796, 1.33346
), Score1 = c(17.985288, 10.549394, 13.392857, 8.644537, 29.591635, 
21.081301, 2.644537, 29.591635, 5.081301, 29.591635, 9.081301
), Group = structure(c(2L, 1L, 1L, 2L, 1L, 4L, 3L, 1L, 2L, 1L, 
2L), .Label = c("A", "B", "C", "F"), class = "factor")), class = "data.frame", row.names = c(NA, 
-11L))

We subset on only A and B:

df = subset(df,Group %in% c("A","B"))

And we can compare the results of a t-test and lm:

coefficients(summary(lm(Score1~ Group,data=df)))
             Estimate Std. Error   t value     Pr(>|t|)
(Intercept)  22.54343   3.653195  6.170881 0.0004580837
GroupB      -12.34532   5.479793 -2.252882 0.0589470215

t.test(df$Score1[df$Group=="B"],df$Score1[df$Group=="A"],data=df)

    Welch Two Sample t-test

data:  df$Score1[df$Group == "B"] and df$Score1[df$Group == "A"]
t = -2.404, df = 6.463, p-value = 0.05007
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -24.695931765   0.005282865
sample estimates:
mean of x mean of y 
 10.19811  22.54343

You get a p-value of 0.0589470215 for the effect of difference of B from A. For the t.test 0.05007, it's not crazily different.

Now for a weighted linear regression:

coefficients(summary(lm(Score1~ Group,data=df,weight=df$PopDens)))
             Estimate Std. Error    t value   Pr(>|t|)
(Intercept) 17.845885   3.780246  4.7208269 0.00215547
GroupB      -5.466244   5.727617 -0.9543663 0.37168503

You can see that the coefficients are estimated differently.. more towards the higher weight samples.

For the weighted t-test offered in package weights:

library(weights)
wtd.t.test(x=df$Score1[df$Group=="A"],y=df$Score1[df$Group=="B"],
weight=df$Score1[df$Group=="A"],weighty=df$Score1[df$Group=="B"],samedata=FALSE)
$test
[1] "Two Sample Weighted T-Test (Welch)"

$coefficients
   t.value         df    p.value 
2.90701563 6.97938063 0.02283172 

$additional
Difference     Mean.x     Mean.y   Std. Err 
 13.468496  25.884728  12.416232   4.633101

Apparently it is a frequency weight in this weighted t-test but I am not sure. If you prefer to use this, will be good to read the code in detail since it is not very well documented how the standard errors etc are calculated.

This is great. In wtd.t.test, If Group A has 400 people and Group B has 600 people, so the total weight is 100. Will the weights applied to group A be 40% and the weights applied to Group B be 60%, or is it just guranteeing that weights in Group A are 100% and Group B are 100%, and doesn't weight across the groups? — Evan, Apr 02 '20 at 17:58
Ok in the code, they use wtd.mean, which is sum(weights * x)/sum(weights). They do this separately for the groups. so it's the latter, weights in A are 100%, B are 100% — StupidWolf, Apr 02 '20 at 18:21
there's no option in the code, you need to work out the math on how to achieve this kind of weighing... quick guess is to adjust the scale the minimum or maximum wieght in both group to be the same — StupidWolf, Apr 02 '20 at 18:38
@StupidWolf In your code for the "weighted t-test offered in package weights:" You set the weight arguments to use `df$Score1`, and not `df$PopDens`. Was this intentional? It doesn't seem like it is considering the population density as a weight the way it is coded, and might not be giving a comparable output to the weighted lm for that reason. — Adam Kemberling, Jul 05 '22 at 18:50

score 0 · Answer 2 · edited Jan 30 '22 at 18:01

0

If you would have more than 2 groups, you could also do an wighted anova with:

library(stats)
aov(Score1 ~ Group, data = df1, weight = PopDens)

edited Jan 30 '22 at 18:01

Valeri Voev

1,982
9
25

answered Jan 29 '22 at 12:15

R-helper

1

How to do a weighted T-test in R?

2 Answers2