AttributeError: 'tuple' object has no attribute 'loc' when filtering on pandas dataframe

Question

Given the following DataFrame -

json_path	Reporting Group	Entity/Grouping	Entity ID	Adjusted Value (Today, No Div, USD)	Adjusted TWR (Current Quarter, No Div, USD)	Adjusted TWR (YTD, No Div, USD)	Annualized Adjusted TWR (Since Inception, No Div, USD)	Adjusted Value (No Div, USD)
data.attributes.total.children.[0].children.[0].children.[0]	Barrack Family	William and Rupert Trust	9957007	-1.44				-1.44
data.attributes.total.children.[0].children.[0].children.[0].children.[0]	Barrack Family	Cash	-	-1.44				-1.44
data.attributes.total.children.[0].children.[0].children.[1]	Barrack Family	Gratia Holdings No. 2 LLC	8413655	55491732.66	-0.971018847	-0.971018847	11.52490309	55491732.66
data.attributes.total.children.[0].children.[0].children.[1].children.[0]	Barrack Family	Investment Grade Fixed Income	-	18469768.6				18469768.6
data.attributes.total.children.[0].children.[0].children.[1].children.[1]	Barrack Family	High Yield Fixed Income	-	3668982.44	-0.205356545	-0.205356545	4.441190127	3668982.44

The following code should filter out rows where rows != 'Cash' (Entity/Grouping column) and that have a blank value in either Adjusted TWR (Current Quarter, No Div, USD) column, Adjusted TWR (YTD, No Div, USD) column or Annualized Adjusted TWR (Since Inception, No Div, USD) column.

Code: The following code expects to achieve this -

def twr_exceptions_logic():
    perf_asset_class_df = databases_creation()

    m1 = perf_asset_class_df.loc[(perf_asset_class_df['Entity/Grouping']!= 'Cash')]
    m2 = perf_asset_class_df[['Adjusted TWR (Current Quarter, No Div, USD)',
                              'Adjusted TWR (YTD, No Div, USD)',
                              'Annualized Adjusted TWR (Since Inception, No Div, USD)']].eq('').any(1)
    perf_asset_class_df.loc[m1&m2]
    
    return perf_asset_class_df

Error: being still relatively new to Python, I am unsure why this AttributeError is throwing back -

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
C:\Users\WILLIA~1.FOR\AppData\Local\Temp/ipykernel_18756/2689024934.py in <module>
     48     writer.save()
     49 
---> 50 xlsx_writer()

C:\Users\WILLIA~1.FOR\AppData\Local\Temp/ipykernel_18756/2689024934.py in xlsx_writer()
      1 # Function that writes Exceptions Report and API Response as a consolidated .xlsx file.
      2 def xlsx_writer():
----> 3     reporting_group_df, unknown_df, perf_asset_class_df, perf_entity_df, perf_entity_group_df = twr_exceptions_logic()
      4 
      5 #   Creating and defining filename for exceptions report

C:\Users\WILLIA~1.FOR\AppData\Local\Temp/ipykernel_18756/2834095962.py in twr_exceptions_logic()
      2     perf_asset_class_df = databases_creation()
      3 
----> 4     m1 = perf_asset_class_df.loc[(perf_asset_class_df['Entity/Grouping']!= 'Cash')]
      5     m2 = perf_asset_class_df[['Adjusted TWR (Current Quarter, No Div, USD)',
      6                               'Adjusted TWR (YTD, No Div, USD)',

AttributeError: 'tuple' object has no attribute 'loc'

Help: I have done some research on this AttributionError and am finding conflicting information, as I believe it relates to my particular issue. It looks as if perf_asset_class_df is being returned as a tuple from the database_creation() function. However, it is definitely a pandas dataframe and the only thing database_creation() does is to take a dataframe named df and apply .loc in order to create a pandas dataframe called perf_asset_class_df or am I missing something

perf_asset_class_df = df[df['json_path'].str.contains(r'(?:\.children\.\[\d+\]){4}')]

databases_creation() function -

def databases_creation():
    df = data_cleansing()

    unknown_df = df[df['Entity/Grouping'].str.contains('Unknown')==True]

    perf_asset_class_df = df[df['json_path'].str.contains(r'(?:\.children\.\[\d+\]){4}')]
    perf_asset_class_df = pd.DataFrame(perf_asset_class_df)
    
    perf_entity_df = df[df['json_path'].str.count(r'\.children').eq(3)]
    perf_entity_group_df = df[df['json_path'].str.count(r'\.children').eq(2)]

    return reporting_group_df, unknown_df, perf_asset_class_df, perf_entity_df, perf_entity_group_df

Does anyone have any suggestions?

databases_creation() returns a tuple, not a pandas dataframe apparently. — LukasNeugebauer, Mar 07 '22 at 17:39
I checked the `databases_creation()` function and by using `perf_asset_class_df.shape` I was able to confirm it is a pandas dataframe. I tried 'forcing' a pandas dataframe by `perf_asset_class_df = pd.DataFrame(perf_asset_class_df)` but this didn't work either. — William, Mar 07 '22 at 20:35
Would help if you could post the `databases_creation()` function. You might have like a trailing comma after your return statement or something that turns it into a tuple. — Jeff, Mar 07 '22 at 20:38
It returns a tuple of data frames. If you only want the first one you have to change the function to only return one dataframe or index the output. — LukasNeugebauer, Mar 07 '22 at 20:41
I think my lack of experience is clearly showing @LukasNeugebauer So you are saying that `return reporting_group_df, unknown_df, perf_asset_class_df, perf_entity_df, perf_entity_group_df` is not returning each respective pandas dataframe? I know we are deviating from the question, however how would I simply return the dataframes. Clearly I've messed up big time. — William, Mar 07 '22 at 20:46
You are returning the data frames, but if you're returning multiple things and only have one output argument, then all of the outputs are stored as a tuple in the one output variable. The outside doesn't know what the variables on the inside of the function are or how they are called. Either only return one df or do databases_creation()[2] if you want the third df that the function returns — LukasNeugebauer, Mar 07 '22 at 20:49

score 3 · Accepted Answer · answered Mar 07 '22 at 20:47

return reporting_group_df, unknown_df, perf_asset_class_df, perf_entity_df, perf_entity_group_df

This line returns a tuple of data frames. You'll need to unpack it when you call the function to get the data frame you're interested in. When your code calls databases_creation() it saves this entire tuple as perf_asset_class_df. If you only want that data frame you'll need to unpack it:

_, _, perf_asset_class_df, _, _ = databases_creation()

This unpacks the tuple, saving each element to the respective variable. We use _ for the parts we don't care about by convention but it could be any other variable.

AttributeError: 'tuple' object has no attribute 'loc' when filtering on pandas dataframe

1 Answers1