0

I've tried to have a T-test model for answering one of my questions. To do so, I create a subset data, then applied chi-square test to see whether data is proper for T-test or not. According to the results, p-value shown approximately 3.5, which is impossible. I thought that it could be because of the sample size of the data I specified, and sample size of the dependent variable(I calculate a new column and use it, its size is ~178).

In details: The code I am sharing is for the project's first question (attached the github link: enter link description here )

The dependent variable: Delay & independent: Gender

The code I gave a try:

Subset data

male = df.query('Gender == "0"')['Delay']

female = df.query('Gender == "1"')['Delay']

df.groupby('Gender').describe()

Create contingency table

GD = pd.crosstab(index=df['Gender'], columns=df['Delay'], margins=True)

GD

chi-square test

chiRes = stats.chi2_contingency(GD)

print(f'chi-square statistic: {chiRes[0]}')

print(f'p-value: {chiRes[1]}')

print(f'degree of freedom: {chiRes[2]}')

print('expected contingency table')

print(chiRes[3])

And these are the findings:

chi-square statistic: 519.651581316998

p-value: 3.590660196919681e-19 (?)

degree of freedom: 262 (?)

As a second way, I tried to Shapiro-Wilks test for normality test.

The code (stats.shapiro(male)) does not even run, creates this error:

ValueError: Data must be at least length 3.

Lastly, I checked the T-test as what if it ensure me on some points but it didn't.

rp.ttest(group1= df['Delay'][df['Gender'] == '0'], group1_name= "Male",

group2= df['Delay'][df['Gender'] == '1'], group2_name= "Female")

Output: All of Mean, SD, SE, Conf. Interval came with NaN. (Although I know that the data has no missing value.)

How can I use a statistical test with this dataset? Is there any points you want to mention?

Merve
  • 1
  • 1
    `p-value: 3.590660196919681e-19 (?)` is not approximately 3.59, but approximately 0. Note the `e-19` part. This is scientific notation for "the previous number times 10 to the power of -19". In other words, move the decimal point 19 places to the left, filling up with zeros. So this number is practically zero. There is either something wrong with your data, or the effect should be so obvious that you do not need a statistical test. – Arne Jan 28 '22 at 10:41
  • Also note that a chi square test alone cannot tell you whether the assumptions of a t test hold. – Arne Jan 28 '22 at 10:45
  • can you focus your question? So Arne is correct, your p-value for the chi-sq test has a very low p value. But it's practically useless here. If you want to test for differences, a wilcoxon or t-test will suffice – StupidWolf Jan 28 '22 at 13:08

0 Answers0