1

I'm working on an assignment for the Data Analysis Tools course through Coursera and I've run into a wall with my code. The assignment is to find the Analysis of variance and run an ANOVA to compare the means of groups. I'm trying to test the hypothesis that shows more episodes of alcohol abuse from the NESARC study to see if there is relation to family history of alcohol abuse.

My Qualitative Variable is S2BQ3B which is number of alcohol abuse (1-99) and my Explanatory Variable is 'FAMHIST' which i took S2DQ1 + S2DQ2 together as they should equal both mother and father who said yes to alcohol abuse.

When running through my test through an OLS summary, I'm receiving an inf for my F-Statistic and an nan for my p-value. I have added a .dropna() to my dataset but that does not seem to have help my results.

import numpy as np
import pandas as pd
import statsmodels.formula.api as smf
import statsmodels.stats.multicomp as multi

data = pd.read_csv('nesarc_pds.csv', low_memory=False)

#Setting variables to numeric
data['S2BQ3B'] = data['S2BQ3B'].convert_objects(convert_numeric=True)
data['S2AQ1'] = pd.to_numeric(data['S2AQ1'])
data['S2DQ1'] = pd.to_numeric(data['S2DQ1'])
data['S2DQ2'] = pd.to_numeric(data['S2DQ2'])

#Subset data to exclude anyone who has never drank in lifetime, or any non alcoholic epsidoes in fam history
sub1=data[(data['S2BQ3B']<=99) & (data['S2DQ1']==1) & (data['S2DQ2']==1)]
sub2=sub1.copy()

sub2['S2BQ3B']=sub2['S2BQ3B'].replace(99,np.nan) # NUMBER OF EPISODES OF ALCOHOL ABUSE
sub2['S2DQ1']=sub2['S2DQ1'].replace(9,np.nan) # BLOOD/NATURAL FATHER EVER AN ALCOHOLIC OR PROBLEM DRINKER
sub2['S2DQ2']=sub2['S2DQ2'].replace(9,np.nan) # BLOOD/NATURAL MOTHER EVER AN ALCOHOLIC OR PROBLEM DRINKER


sub2['FAMHIST']=sub2['S2DQ1'] + sub2['S2DQ2']
sub2['FAMHIST']=pd.to_numeric(sub2['FAMHIST'])

sub3=sub2.dropna()

# Using ols function for calculating the F-statistic and associated p value
# OLS - Ordinary lease squares
model1 = smf.ols(formula='S2BQ3B ~ C(FAMHIST)', data=sub3).fit()
print(model1.summary())

Attached is the OLS Report results for reference. Any help would be greatly appreciated!

My OLS Report reults

ricopella
  • 63
  • 7
  • You are only regressing on a constant. Is this what you intended? What does your `sub3['FAMHIST']` in the ols regression look like. Or, is the ols results from a different regression? You can copy and paste regression results into the question so we can see them directly. – Josef Aug 28 '16 at 22:00
  • 1
    To answer the question as it is now: If you only regress on a constant, then the f-value doesn't make sense because it reflects a hypothesis test that doesn't impose any restriction. I don't know whether the `inf` makes sense, because I think this corner case has never been checked. – Josef Aug 28 '16 at 22:06
  • Thank you for responding. You were correct. I was using a constant variable. I had to pick a new variable in order to finish the assignment. – ricopella Aug 31 '16 at 04:53

0 Answers0