-1

Every t test I run outputs nan for both statistic and p value I have checked my dataframes and they look fine. Does anyone know what's happening? Thanks in advance!

e_tr.groupby('Rest Periods')['Wages and Hours'].mean()

#t test
cat1 = e_tr[e_tr['Rest Periods']==0]
cat2 = e_tr[e_tr['Rest Periods']==1]
# cat1['Wages and Hours'].value_counts()
sp.stats.ttest_ind(cat1.dropna()['Rest Periods'], cat2.dropna()['Rest Periods'])
ttest_ind(cat1['Wages and Hours'], cat2['Wages and Hours'])

Output: Ttest_indResult(statistic=nan, pvalue=nan)

tianlinhe
  • 991
  • 1
  • 6
  • 15
Matthias Gallagher
  • 475
  • 1
  • 7
  • 20
  • 1
    The question is not fully transparent, see [here](https://stackoverflow.com/help/minimal-reproducible-example) for more information. E.g. what is `e_tr` here and what libraries you're using? It may only be obvious for you. – colidyre Apr 08 '20 at 10:24

1 Answers1

3

That is possibly because your test and contro in column 'Wages and Hours' still contain np.nan. Try to clean your data first:

e_tr=e_tr[e_tr['Wages and Hours'].notnull()]

Then assign case and control as you did:

cat1 = e_tr[e_tr['Rest Periods']==0]
cat2 = e_tr[e_tr['Rest Periods']==1]

So now if you run:

ttest_ind(cat1['Wages and Hours'], cat2['Wages and Hours'])

Should give your the anticipated statistics.

tianlinhe
  • 991
  • 1
  • 6
  • 15