I'm working on an assignment for the Data Analysis Tools course through Coursera and I've run into a wall with my code. The assignment is to find the Analysis of variance and run an ANOVA to compare the means of groups. I'm trying to test the hypothesis that shows more episodes of alcohol abuse from the NESARC study to see if there is relation to family history of alcohol abuse.
My Qualitative Variable is S2BQ3B which is number of alcohol abuse (1-99) and my Explanatory Variable is 'FAMHIST' which i took S2DQ1 + S2DQ2 together as they should equal both mother and father who said yes to alcohol abuse.
When running through my test through an OLS summary, I'm receiving an inf for my F-Statistic and an nan for my p-value. I have added a .dropna() to my dataset but that does not seem to have help my results.
import numpy as np
import pandas as pd
import statsmodels.formula.api as smf
import statsmodels.stats.multicomp as multi
data = pd.read_csv('nesarc_pds.csv', low_memory=False)
#Setting variables to numeric
data['S2BQ3B'] = data['S2BQ3B'].convert_objects(convert_numeric=True)
data['S2AQ1'] = pd.to_numeric(data['S2AQ1'])
data['S2DQ1'] = pd.to_numeric(data['S2DQ1'])
data['S2DQ2'] = pd.to_numeric(data['S2DQ2'])
#Subset data to exclude anyone who has never drank in lifetime, or any non alcoholic epsidoes in fam history
sub1=data[(data['S2BQ3B']<=99) & (data['S2DQ1']==1) & (data['S2DQ2']==1)]
sub2=sub1.copy()
sub2['S2BQ3B']=sub2['S2BQ3B'].replace(99,np.nan) # NUMBER OF EPISODES OF ALCOHOL ABUSE
sub2['S2DQ1']=sub2['S2DQ1'].replace(9,np.nan) # BLOOD/NATURAL FATHER EVER AN ALCOHOLIC OR PROBLEM DRINKER
sub2['S2DQ2']=sub2['S2DQ2'].replace(9,np.nan) # BLOOD/NATURAL MOTHER EVER AN ALCOHOLIC OR PROBLEM DRINKER
sub2['FAMHIST']=sub2['S2DQ1'] + sub2['S2DQ2']
sub2['FAMHIST']=pd.to_numeric(sub2['FAMHIST'])
sub3=sub2.dropna()
# Using ols function for calculating the F-statistic and associated p value
# OLS - Ordinary lease squares
model1 = smf.ols(formula='S2BQ3B ~ C(FAMHIST)', data=sub3).fit()
print(model1.summary())
Attached is the OLS Report results for reference. Any help would be greatly appreciated!