0

How do I write a function definition, when one of the variables in the def needs to call something related to a module?

Implementable example:

df is a database with some columns:

Loc Day A  B
1   1   2  4
1   2   4  2
2   3   7  9
3   4   1  9

Operational code:

import pandas as pd
from labellines import labelLine, labelLines
from linearmodels.panel import PanelOLS
import statsmodels.api as sm

df = pd.DataFrame({"Loc":[1,1,2,3],"Day":[1,2,3,4],"A":[2,4,7,1],"B":[4,2,9,9]})

def Panel_Regression():
    data = df
    day = pd.Categorical(data.Day)
    data = data.set_index(["Loc", "Day"])
    data["Day"] = day
    exog_vars = ["B"]
    exog = sm.add_constant(data[exog_vars])

    mod = PanelOLS(data.A, exog, entity_effects=True, time_effects=True, drop_absorbed=True)

    fe_te_res = mod.fit()
    print(fe_te_res)

Panel_Regression()

That's operational, but I want to add entries to my "Panel_Regression" function definition. That way I could call it multiple times in a loop.

My problem comes when I try to put "data.A" in "Panel_Regression", like so:

def Panel_Regression(my_variable):
    data = df
    day = pd.Categorical(data.Day)
    data = data.set_index(["Loc", "Day"])
    data["Day"] = day
    exog_vars = ["B"]
    exog = sm.add_constant(data[exog_vars])

    mod = PanelOLS(data.my_variable, exog, entity_effects=True, time_effects=True, drop_absorbed=True)

    fe_te_res = mod.fit()
    print(fe_te_res)

Panel_Regression("A")

I get error: "'DataFrame' object has no attribute 'my_variable'"

I also tried: Panel_Regression(data.A)

... but that doesn't work either, because "data" is only defined in the def, so trying to bring it out of the function definition also doesn't work.

I assume I'm missing something basic about how to call this from my own def. Also, if there's a better way to title this post, I'm happy to change it.

Thank you!

Maxim Lott
  • 368
  • 4
  • 15
  • 2
    Terminology note, one would say "function" or "a function definition", not a "def" or a "definition". Also note, very important, `data = df` **does not copy your dataframe** and any mutator operations will affect the dataframe referenced by `df` – juanpa.arrivillaga Feb 01 '22 at 19:39
  • Good to know, thanks. I've edited the terminology as you suggest. – Maxim Lott Feb 01 '22 at 19:42
  • 1
    `data[my_variable]`? – BigBen Feb 01 '22 at 19:44
  • @BigBen, that worked, thank you! If you want to put it as an answer, will mark it correct. If there's a way to explain the difference between data[my_variable] and data.my_variable, I'd also be interested to know. – Maxim Lott Feb 01 '22 at 19:47
  • 1
    https://stackoverflow.com/questions/46066026/proper-way-to-access-a-column-of-a-pandas-dataframe ... and I'm guessing this question is a duplicate but no time to look for a dupe target. – BigBen Feb 01 '22 at 19:48

1 Answers1

0

Per BigBen in the comments, the answer is:

Replace data.my_variable with data[my_variable]

That worked. Apparently data[my_variable] is strictly better: Proper way to access a column of a pandas dataframe

Maxim Lott
  • 368
  • 4
  • 15