First off, great first question and glad to see high school kids learning statistical programming!
Second: You are well on your way to the answer yourself, this should help you get there.
I am making some assumptions:
prof
is the name of your data frame
2 that you are looking to compare the ages of the genders from prof in your t-test
You are working in the right directions with your logic. I added a few more made up observations in my prof
data frame but here is how it should work:
# this is a comment in the code, not code, but it explains the reasoning, it always starts with hash tag
women<-prof[which(prof$Sex=="F"),] #notice the comma after parenthesis
men<-prof[which(prof$Sex=="M"),] #notice the comma after parenthesis here too
The left of the comma selects the rows with that data == "something". The right of the comma tells you which columns, leaving it empty tells r to include all columns.
head(men);head(women) # shows you first 6 rows of each new frame
# you can see below that the data is still in a data frame
Sex Age
1 M 21
4 M 43
5 M 12
6 M 36
7 M 21
10 M 23
Sex Age
2 F 31
3 F 42
8 F 52
9 F 21
11 F 36
so to t-test for age, you must ask for the data frame by name AND the column with Age, example: men$Age
t.test(women$Age, men$Age) #this is the test
# results below
Welch Two Sample t-test
data: women$Age and men$Age
t = 0.59863, df = 10.172, p-value = 0.5625
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-11.93964 20.73964
sample estimates:
mean of x mean of y
36.4 32.0
There is almost always more than one way in R. And sometimes the initial sorting is more complicated, but working with data down the road is easier. So,if you would rather not address age from a data frame you can ask for the column in your initial subset
women<-prof[which(prof$Sex=="F"),"Age"] #set women equal to just the ages where Sex is 'F'
men<-prof[which(prof$Sex=="M"), "Age"]#set men equal to just the ages where Sex is 'M'
And review your data again, this time just a vector of ages for each variable:
head(women); head(men)
[1] 31 42 52 21 36
[1] 21 43 12 36 21 23
Then your t-test is a simple comparison:
t.test(women,men)
# notice same results
Welch Two Sample t-test
data: women and men
t = 0.59863, df = 10.172, p-value = 0.5625
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-11.93964 20.73964
sample estimates:
mean of x mean of y
36.4 32.0
It appears that your problem lies in three spots in your code:
- using
gender=="F"
when the column is named Sex:
- not using to comma in your
[,]
to specify rows then columns
- not addressing the column $Age in your t.test if it is indeed still
two columns
The above codes should get you where you need to be.