2

I want to perform a t-test on mean age between men and women at time of arrest. However, my data is arranged like so:

Sex: Age:
M    21
F    31
F    42
M    43

Is there a way to separate the sex category into two separate categories (male and female) in order to perform my t-test? Or to perform a t-test within one category? Similar questions have been asked but none that seem to work on my data set. Thanks for any guidance you could offer!

zwer
  • 24,943
  • 3
  • 48
  • 66
Eliza Paige
  • 21
  • 1
  • 3
  • Welcome to the site! Before posting a question, please read the guidelines for writing a minimal, complete verifiable code example (https://stackoverflow.com/help/mcve). Currently, your question appears to have more to do with statistics than it does programming. What language are you using? What have you tried so far? There's a site for statistics questions (https://stats.stackexchange.com/), but stackoverflow.com exists for another purpose. See https://stackoverflow.com/help/on-topic for details. – Austin May 19 '17 at 01:57
  • Sorry for vagueness- I'm using "R" for the first time for a HS stats project (which I hope explains my minimal details). So far I've tried:men <- prof[ which(prof$gender=='M')] women <- prof[ which(prof$gender=='F')] t.test(men, women) – Eliza Paige May 19 '17 at 02:09
  • I've also tried to use the answer on https://stackoverflow.com/questions/41442344/how-do-i-make-a-t-test-across-several-groups-in-one-column-in-r this link but was unsure how to apply it to my own data. – Eliza Paige May 19 '17 at 02:12
  • Both answers are correct. Just be sure that you understand what you are asking R to do in each one. There are many ways to sort data. I chose my answer to support how you were already working with the data frame because it seemed like it would help you understand how to get where you were headed...but both are equally effective. – sconfluentus May 19 '17 at 04:13

3 Answers3

4

First off, great first question and glad to see high school kids learning statistical programming!

Second: You are well on your way to the answer yourself, this should help you get there.

I am making some assumptions:

  1. prof is the name of your data frame 2 that you are looking to compare the ages of the genders from prof in your t-test

You are working in the right directions with your logic. I added a few more made up observations in my prof data frame but here is how it should work:
# this is a comment in the code, not code, but it explains the reasoning, it always starts with hash tag

women<-prof[which(prof$Sex=="F"),] #notice the comma after parenthesis
men<-prof[which(prof$Sex=="M"),] #notice the comma after parenthesis here too 

The left of the comma selects the rows with that data == "something". The right of the comma tells you which columns, leaving it empty tells r to include all columns.

head(men);head(women) # shows you first 6 rows of each new frame
# you can see below that the data is still in a data frame

   Sex Age
1    M  21
4    M  43
5    M  12
6    M  36
7    M  21
10   M  23
   Sex Age
2    F  31
3    F  42
8    F  52
9    F  21
11   F  36

so to t-test for age, you must ask for the data frame by name AND the column with Age, example: men$Age

t.test(women$Age, men$Age) #this is the test

 # results below

Welch Two Sample t-test

data:  women$Age and men$Age
t = 0.59863, df = 10.172, p-value = 0.5625
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:

 -11.93964  20.73964
sample estimates:
mean of x mean of y 
     36.4      32.0 

There is almost always more than one way in R. And sometimes the initial sorting is more complicated, but working with data down the road is easier. So,if you would rather not address age from a data frame you can ask for the column in your initial subset

women<-prof[which(prof$Sex=="F"),"Age"] #set women equal to just the ages where Sex is 'F'
men<-prof[which(prof$Sex=="M"), "Age"]#set men equal to just the ages where Sex is 'M'

And review your data again, this time just a vector of ages for each variable:

head(women); head(men)
[1] 31 42 52 21 36
[1] 21 43 12 36 21 23

Then your t-test is a simple comparison:

t.test(women,men)
 # notice same results

    Welch Two Sample t-test

data:  women and men
t = 0.59863, df = 10.172, p-value = 0.5625
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -11.93964  20.73964
sample estimates:
mean of x mean of y 
     36.4      32.0 

It appears that your problem lies in three spots in your code:

  1. using gender=="F" when the column is named Sex:
  2. not using to comma in your [,] to specify rows then columns
  3. not addressing the column $Age in your t.test if it is indeed still two columns

The above codes should get you where you need to be.

sconfluentus
  • 4,693
  • 1
  • 21
  • 40
0

A t-test comparing the ages of men to the ages of women can be done like:

df = data.frame(
    gender = c("M", "F", "F", "M"),
    age = c(21, 31, 42, 43)
)

t.test(age ~ gender, data = df)

This is the test that seems most relevant based on your question.

I'm not sure what you mean when you say "perform a t-test within one category": you can compare a set of values from one group to some known reference value like 0, but I'm not sure what that could tell you (other than that the men in your sample are not 0 years old).

Marius
  • 58,213
  • 16
  • 107
  • 105
0

You could try this code:

t.test(Age ~ Sex, paired = FALSE, data = datasetName)

It should give you the same result without the hassle of creating more subsets.

Rachit Kinger
  • 341
  • 2
  • 10