3

I am trying to do a two-way ANOVA, where I am trying to find the importance of two variables (B and M) on the classification of samples (given by the parameter C).

I am trying to reshape the data frame to make it suitable for statsmodels package. However, I have only been able to include one variable at a time (either B or M) using pd.melt.

Any suggestion on how can I use the values of both variables to perform the two-way ANOVA (in a way like the last two lines of the code given below) would be a great help.

The values of B, M and C:

B : [10.,4.,4.,6.,5.]
M : [9.,6.,8.,4.,6.]
C : [1.,2.,2.,3.,1.]

import numpy as np
import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols
d = pd.read_csv("/Users/Hrihaan/Desktop/Data.txt", sep="\s+")
d_melt = pd.melt(d, id_vars=['C'], value_vars=['B'])
#model = ols('C ~ C(B) + C(M) + C(B):C(M)', data=d_melt).fit()
#anova_table = sm.stats.anova_lm(model, typ=2)
StupidWolf
  • 45,075
  • 17
  • 40
  • 72
Hrihaan
  • 275
  • 5
  • 21

1 Answers1

0

You were close to the answer:

B = [10.,4.,4.,6.,5.]
M = [9.,6.,8.,4.,6.]
C = [1.,2.,2.,3.,1.]

import numpy as np
import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols

d = pd.DataFrame()
d["B"]=B
d["M"]=M
d["C"]=C
model = ols("C ~ B + M + B:M",data = d).fit()
anova_table = sm.stats.anova_lm(model, typ=2)

You create a dataframe, you set your model, you perform the Anova

user4624500
  • 286
  • 1
  • 10