1

Given the following DataFrame -

json_path Reporting Group Entity/Grouping Entity ID Adjusted Value (Today, No Div, USD) Adjusted TWR (Current Quarter, No Div, USD) Adjusted TWR (YTD, No Div, USD) Annualized Adjusted TWR (Since Inception, No Div, USD) Adjusted Value (No Div, USD) TWR Audit Note
data.attributes.total.children.[0].children.[0].children.[0] Barrack Family William and Rupert Trust 9957007 -1.44 -1.44
data.attributes.total.children.[0].children.[0].children.[0].children.[0] Barrack Family Cash - -1.44 -1.44
data.attributes.total.children.[0].children.[0].children.[1] Barrack Family Gratia Holdings No. 2 LLC 8413655 55491732.66 -0.971018847 -0.971018847 11.52490309 55491732.66
data.attributes.total.children.[0].children.[0].children.[1].children.[0] Barrack Family Investment Grade Fixed Income - 18469768.6 18469768.6
data.attributes.total.children.[0].children.[0].children.[1].children.[1] Barrack Family High Yield Fixed Income - 3668982.44 -0.205356545 -0.205356545 4.441190127 3668982.44

The following code should filter out rows where rows != 'Cash' (Entity/Grouping column) and that have a blank value in either Adjusted TWR (Current Quarter, No Div, USD) column, Adjusted TWR (YTD, No Div, USD) column or Annualized Adjusted TWR (Since Inception, No Div, USD) column.

Code: The following code expects to achieve this -

def twr_exceptions_logic():
    perf_asset_class_df = databases_creation()

    m1 = perf_asset_class_df.loc[(perf_asset_class_df['Entity/Grouping']!= 'Cash')]
    m2 = perf_asset_class_df[['Adjusted TWR (Current Quarter, No Div, USD)',
                              'Adjusted TWR (YTD, No Div, USD)',
                              'Annualized Adjusted TWR (Since Inception, No Div, USD)']].eq('').any(1)
    perf_asset_class_df.loc[m1&m2]
    
    return perf_asset_class_df

Error: being still relatively new to Python, I am unsure why this AttributeError is throwing back -

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
C:\Users\WILLIA~1.FOR\AppData\Local\Temp/ipykernel_18756/2689024934.py in <module>
     48     writer.save()
     49 
---> 50 xlsx_writer()

C:\Users\WILLIA~1.FOR\AppData\Local\Temp/ipykernel_18756/2689024934.py in xlsx_writer()
      1 # Function that writes Exceptions Report and API Response as a consolidated .xlsx file.
      2 def xlsx_writer():
----> 3     reporting_group_df, unknown_df, perf_asset_class_df, perf_entity_df, perf_entity_group_df = twr_exceptions_logic()
      4 
      5 #   Creating and defining filename for exceptions report

C:\Users\WILLIA~1.FOR\AppData\Local\Temp/ipykernel_18756/2834095962.py in twr_exceptions_logic()
      2     perf_asset_class_df = databases_creation()
      3 
----> 4     m1 = perf_asset_class_df.loc[(perf_asset_class_df['Entity/Grouping']!= 'Cash')]
      5     m2 = perf_asset_class_df[['Adjusted TWR (Current Quarter, No Div, USD)',
      6                               'Adjusted TWR (YTD, No Div, USD)',

AttributeError: 'tuple' object has no attribute 'loc'

Help: I have done some research on this AttributionError and am finding conflicting information, as I believe it relates to my particular issue. It looks as if perf_asset_class_df is being returned as a tuple from the database_creation() function. However, it is definitely a pandas dataframe and the only thing database_creation() does is to take a dataframe named df and apply .loc in order to create a pandas dataframe called perf_asset_class_df or am I missing something

perf_asset_class_df = df[df['json_path'].str.contains(r'(?:\.children\.\[\d+\]){4}')]

databases_creation() function -

def databases_creation():
    df = data_cleansing()

    unknown_df = df[df['Entity/Grouping'].str.contains('Unknown')==True]

    perf_asset_class_df = df[df['json_path'].str.contains(r'(?:\.children\.\[\d+\]){4}')]
    perf_asset_class_df = pd.DataFrame(perf_asset_class_df)
    
    perf_entity_df = df[df['json_path'].str.count(r'\.children').eq(3)]
    perf_entity_group_df = df[df['json_path'].str.count(r'\.children').eq(2)]

    return reporting_group_df, unknown_df, perf_asset_class_df, perf_entity_df, perf_entity_group_df

Does anyone have any suggestions?

William
  • 191
  • 5
  • 32
  • 3
    databases_creation() returns a tuple, not a pandas dataframe apparently. – LukasNeugebauer Mar 07 '22 at 17:39
  • I checked the `databases_creation()` function and by using `perf_asset_class_df.shape` I was able to confirm it is a pandas dataframe. I tried 'forcing' a pandas dataframe by `perf_asset_class_df = pd.DataFrame(perf_asset_class_df)` but this didn't work either. – William Mar 07 '22 at 20:35
  • 1
    Would help if you could post the `databases_creation()` function. You might have like a trailing comma after your return statement or something that turns it into a tuple. – Jeff Mar 07 '22 at 20:38
  • Good point, @Jeff I have inc. in the Question. Thanks! – William Mar 07 '22 at 20:40
  • It returns a tuple of data frames. If you only want the first one you have to change the function to only return one dataframe or index the output. – LukasNeugebauer Mar 07 '22 at 20:41
  • I think my lack of experience is clearly showing @LukasNeugebauer So you are saying that `return reporting_group_df, unknown_df, perf_asset_class_df, perf_entity_df, perf_entity_group_df` is not returning each respective pandas dataframe? I know we are deviating from the question, however how would I simply return the dataframes. Clearly I've messed up big time. – William Mar 07 '22 at 20:46
  • You are returning the data frames, but if you're returning multiple things and only have one output argument, then all of the outputs are stored as a tuple in the one output variable. The outside doesn't know what the variables on the inside of the function are or how they are called. Either only return one df or do databases_creation()[2] if you want the third df that the function returns – LukasNeugebauer Mar 07 '22 at 20:49

1 Answers1

3
return reporting_group_df, unknown_df, perf_asset_class_df, perf_entity_df, perf_entity_group_df

This line returns a tuple of data frames. You'll need to unpack it when you call the function to get the data frame you're interested in. When your code calls databases_creation() it saves this entire tuple as perf_asset_class_df. If you only want that data frame you'll need to unpack it:

_, _, perf_asset_class_df, _, _ = databases_creation()

This unpacks the tuple, saving each element to the respective variable. We use _ for the parts we don't care about by convention but it could be any other variable.

Jeff
  • 1,234
  • 8
  • 16