0

I have the following command below:

Townames = []
Notowns = [] 
def run_ttest():
    for key,value in enumerate(data['RegionName']):
                 if value in stateslist:
                    indexing = data['differ'].iloc[key]
                    Townames.append(indexing) 
                 else:
                    indexing = data['differ'].iloc[key]
                    Notowns.append(indexing)
    Unitowns = pd.DataFrame(columns = ['Unitownvalues'])
    Notunitowns = pd.DataFrame(columns = ['Notunitownvalues'])
    Unitowns['Unitownvalues'] = Townames      
    Notunitowns['Notunitownvalues'] = Notowns 
    Unitowns = Unitowns.dropna(subset=['Unitownvalues'])
    Notunitowns = Notunitowns.dropna(subset=['Notunitownvalues'])
    return 
run_ttest()
from scipy import stats 
stats.ttest_ind(Unitowns['Unitownvalues'],Notunitowns['Notunitownvalues'])

However, my output is:

Ttest_indResult(statistic=nan, pvalue=nan)

I cannot understand why this is.

I removed the NAN values above: Unitowns['Unitownvalues'] and Notunitowns['Notunitownvalues'].

Would anybody be able to give me a helping hand?

Caledonian26
  • 727
  • 1
  • 10
  • 27

1 Answers1

0

Make sure you add the final argument (the assumption is that the variances of the two datasets which you are comparing will be equal):

stats.ttest_ind(Unitowns['Unitownvalues'],Notunitowns['Notunitownvalues'],equal_var=True)

rather than:

stats.ttest_ind(Unitowns['Unitownvalues'],Notunitowns['Notunitownvalues'])

This then gave me an output of:

Ttest_indResult(statistic=0.38697667088831, pvalue=0.69878181110717441)
Caledonian26
  • 727
  • 1
  • 10
  • 27