0

In R, I want to run a statistical test to compare the averages between two categories, but I do not know how to organise my data to do so.

Mock example

My data is organised like:

structure(list(age = c(39, 45, 83, 68, 48, 52, 66, 50, 61, 67), gender = 
structure(c(2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L), .Label = c("female", 
"male"), class = "factor")), .Names = c("age", "gender"), row.names = c(NA, 
10L), class = "data.frame")

What I want to do is compare the average of each gender with a Welch t-test, answering the question "do women's ages are significantly different from men's ages?".

Theoretically, to run the test, I think my data should be in the form:

male  female
39    45
83    61
...

I'm sure there is either a way to run the test directly on the original table or an easy way to transform my data into this form...

So, how should I proceed?

M--
  • 25,431
  • 8
  • 61
  • 93
francoiskroll
  • 1,026
  • 3
  • 13
  • 24
  • Do you have the same number of females and males in your original data frame? – M-- Jun 07 '17 at 15:03
  • No. And I actually have a lot of categorical variables I want to compare age with (not necessarily binary like gender) – francoiskroll Jun 07 '17 at 15:05
  • Your title about grouping data and asking about it is misleading. That's kinda an [XY problem](https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem). I would suggest focusing on t-test and how to perform it. – M-- Jun 07 '17 at 15:07
  • Ho ok I see. Thanks for the feedback, I'm changing the title – francoiskroll Jun 07 '17 at 15:09
  • 1
    `t.test(df$age~df$gender)` – M-- Jun 07 '17 at 15:15

2 Answers2

2

If df is your data set, you can do

t.test(age ~ gender, data=df, alternative='two.sided')

and there's no need to reorganise the data.

Ernest A
  • 7,526
  • 8
  • 34
  • 40
1

I would go with the pretty data.table: assuming dt is a data.table (dt<-data.table(dataBase), that is):

library(stats)
library(data.table)

dt[,t.test(age),by=gender]

resulting in:

   gender statistic parameter      p.value  conf.int estimate null.value alternative            method data.name
1:   male  11.73781         7 7.373447e-06  47.21406   59.125          0   two.sided One Sample t-test       age 
2:   male  11.73781         7 7.373447e-06  71.03594   59.125          0   two.sided One Sample t-test       age
3: female   6.62500         1 9.537357e-02 -48.64964   53.000          0   two.sided One Sample t-test       age
4: female   6.62500         1 9.537357e-02 154.64964   53.000          0   two.sided One Sample t-test       age
M--
  • 25,431
  • 8
  • 61
  • 93
amonk
  • 1,769
  • 2
  • 18
  • 27