3

I am trying to run a hypothesis test using model ols. I am trying to do this model Ols for tweet count based on four groups that I have in my data frame. The four groups are Athletes, CEOs, Politicians, and Celebrities. I have the four groups each labeled for each name in one column as a group.

frames = [CEO_df, athletes_df, Celebrity_df, politicians_df]
final_df = pd.concat(frames)
final_df=final_df.reindex(columns=["name","group","tweet_count","retweet_count","favorite_count"])
final_df
model=ols("tweet_count ~ C(group)", data=final_df).fit()
table=sm.stats.anova_lm(model, typ=2)
print(table)

I want to do something along the lines of:

model=ols("tweet_count ~ C(Athlete) + C(Celebrity) + C(CEO) + C(Politicians)", data=final_df).fit()
table=sm.stats.anova_lm(model, typ=2)
print(table)

Is that even possible? How else will I be able to run a hypothesis test with those conditions?

Here is my printed final_df:

name    group   tweet_count retweet_count   favorite_count
0   @aws_cloud @ #ReInvent R “Ray” Wang 王瑞光 #1A CEO 6   6   0
1   Aaron Levie CEO 48  1140    18624
2   Andrew Mason    CEO 24  0   0
3   Bill Gates  CEO 114 78204   439020
4   Bill Gross  CEO 36  486 1668
... ... ... ... ... ...
56  Tim Kaine   Politician  48  8346    50898
57  Tim O'Reilly    Politician  14  28  0
58  Trey Gowdy  Politician  12  1314    6780
59  Vice President Mike Pence   Politician  84  1146408 0
60  klay thompson   Politician  48  41676   309924

Anant Kumar
  • 611
  • 5
  • 20
Noah Skole
  • 31
  • 2
  • It is possible, please view this [link](https://www.statsmodels.org/devel/examples/notebooks/generated/interactions_anova.html) – Anant Kumar Dec 03 '20 at 07:11
  • Thank you for the link. However, I am not finding how to solve my issue using this link. Any specifics I can do to implement this easily? – Noah Skole Dec 03 '20 at 07:21
  • can you explain clearly what do you want to test? Is it each individual level under group? – StupidWolf Dec 04 '20 at 23:00

0 Answers0