Python .loc within function VS outside the function

Question

I have a quite specific question pertaining to how ".loc" function works on the backend when 1. applied directly to a daraframe (ex. df.loc[]) as opposed to being used in a defined method and then applied using "df.apply()".

Here is the MultiIndex dataframe structure I am working with.

[My DataFrame 1]

#Sample Function
def sample(df):
    for i in df:
        val = df.loc['deep_impressions'] > 0
        return val.sum()
df.apply(sample, axis=1)

The above code uses .loc without row/column indication by simply passing the outer column label and when applied to the DataFrame, returns the correct output, which is the sum of the 2 columns under te "deep_impressions" outer column index.

However, when applying the same logic not using a defined method, I must explicitly state that all rows, and only "deep_impressions" columns are to be summed.

df.loc[:,'deep_impressions'] > 0 
df.sum(axis=1)
df

Why doesn't python require me to explicitly state (.loc[:,"deep_impressions]) when used in a defined method? How does it work on the backend?

This has nothing to do with where `loc` is being called, i.e. inside or outside a function, rather, it is *what* it is being called on. When you pass a function to `.apply`, that function gets passed a `pd.Series` not a `pd.DataFrame`, each `pd.Series` being either a column or a row of that data-frame. — juanpa.arrivillaga, Nov 11 '19 at 21:26
Python doesn't have dataframes. Presumably you're using Pandas or something? Please add a relevant tag. — ChrisGPT was on strike, Nov 11 '19 at 21:26

Python .loc within function VS outside the function

0 Answers0