0

I've a dataframe as below:

Region   Votes

    A          23
    B          26
    A          32
    B          46
    A          32
    B          24

I calculated mean of votes for region A and B by following code data.groupby('Region')['Votes'].mean().Now I've to do ptest to determine whether this difference is statistically significant.I tried this code

one = data[data['Region']=='one']
two = data[data['Region']=='two']

print(st.ttest_ind(one['Votes'], two['Votes'])).

I'm getting nan in output instead of values i.e

  Ttest_indResult(statistic=nan, pvalue=nan)

Can somebody tell me what I'm doing wrong?

sacuL
  • 49,704
  • 8
  • 81
  • 106
Vro
  • 69
  • 1
  • 9

1 Answers1

0

If you change:

one = data[data['Region']=='one']
two = data[data['Region']=='two']

to

one = data[data['Region']=='A']
two = data[data['Region']=='B']

It will work. Or, do it all at once using:

st.ttest_ind(data.loc[data.Region == 'A', 'Votes'], data.loc[data.Region == 'B', 'Votes'])
#Ttest_indResult(statistic=-0.3927922024247863, pvalue=0.7145066681331176)

Or use a groupby, converting the Votes from each region to a list first:

gb = df.groupby('Region')['Votes'].apply(list)
st.ttest_ind(*gb)
#Ttest_indResult(statistic=-0.3927922024247863, pvalue=0.7145066681331176)
sacuL
  • 49,704
  • 8
  • 81
  • 106