0

I have a quite specific question pertaining to how ".loc" function works on the backend when 1. applied directly to a daraframe (ex. df.loc[]) as opposed to being used in a defined method and then applied using "df.apply()".

Here is the MultiIndex dataframe structure I am working with.

[My DataFrame 1]

#Sample Function
def sample(df):
    for i in df:
        val = df.loc['deep_impressions'] > 0
        return val.sum()
df.apply(sample, axis=1)

The above code uses .loc without row/column indication by simply passing the outer column label and when applied to the DataFrame, returns the correct output, which is the sum of the 2 columns under te "deep_impressions" outer column index.

However, when applying the same logic not using a defined method, I must explicitly state that all rows, and only "deep_impressions" columns are to be summed.

df.loc[:,'deep_impressions'] > 0 
df.sum(axis=1)
df

Why doesn't python require me to explicitly state (.loc[:,"deep_impressions]) when used in a defined method? How does it work on the backend?

Nick ODell
  • 15,465
  • 3
  • 32
  • 66
  • This has nothing to do with where `loc` is being called, i.e. inside or outside a function, rather, it is *what* it is being called on. When you pass a function to `.apply`, that function gets passed a `pd.Series` not a `pd.DataFrame`, each `pd.Series` being either a column or a row of that data-frame. – juanpa.arrivillaga Nov 11 '19 at 21:26
  • 1
    Python doesn't have dataframes. Presumably you're using Pandas or something? Please add a relevant tag. – ChrisGPT was on strike Nov 11 '19 at 21:26

0 Answers0