I have a df1 like this:
Stabr Area_name Score1 Score2 POVALL_2018 Score3
3 AL Autauga County 2 2 7,587 13.8
4 AL Baldwin County 2 2 21,069 9.8
7 AL Blount County 2 1 7,527 13.2
8 AL Bullock County 3 6 3,610 42.5
9 AL Butler County 3 6 4,731 24.5
10 AL Calhoun County 3 2 21,719 19.5
11 AL Chambers County 6 5 6,181 18.7
12 AL Cherokee County 2 6 4,180 16.3
13 AL Chilton County 2 1 7,542 17.3
14 AL Choctaw County 3 10 2,806 22.1
16 AL Clay County 9 10 2,285 17.6
17 AL Cleburne County 8 4 2,356 16.0
I only care about columns score1
and score3
. I would like to perform a simple t-test amongst to see if all the counties with a score1
of 2
have a different score3
compared to all the counties with a score1
of 3.
Very concretely, I would like to see if the mean of 13.8, 9.8, 13.2, 16.3, 17.3, is significantly different from the mean of 42.5, 24.5, 19.5, 22.1. How can I do this? I would like to ignore all rows that have a score1
different than 2 or 3.
How is this done?