0

I have a dataframe that is populated via a mySQL query. I convert this to an array to try and calculate the irr. The sorting and groupby functions work as intended, but I cannot seem to get around this error. I looked at this issue, but it doesn't seem to apply in my case.

I have tried to pass a list, DataFrame, and pandas table, but the function is looking for an array so that's what I send it. Any help on this is appreciated.

Thanks!

This is the dataset produced as the flows_by_year variable
                     sum
EffectiveDate           
2017          -3660000.0
2018          -5520000.0
2019          -2460213.0
2020           1600000.0


import pandas as pd
import sqlite3
from sqlite3 import Error
import numpy_financial as npf

def IRR(fund, pool):
    database = r'C:\Transaction_History.db'
    conn = create_connection(database)
    sql_flows = ('''SELECT EFFECTIVEDATE, TRANSACTIONAMOUNT FROM Trans_Hist WHERE FUND=''' + 
                 '"' + fund + '"' + ' AND POOL=' + '"' + pool + '"' + 
                 ' AND (TRANSACTIONTYPE = "Funding" OR TRANSACTIONTYPE = "Cash");')
    flows = pd.read_sql_query(sql_flows, conn, parse_dates=['EffectiveDate'])
    flows.sort_values(by=['EffectiveDate'], inplace=True, ascending=True)
    flows_by_year = flows.groupby(flows['EffectiveDate'].dt.year)['TransactionAmount'].agg(['sum'])
    print(flows_by_year)
    irr = round(npf.irr(flows_by_year.values), 4) * 100
    irr = f'{irr:,.2f}'
    print(irr)
    return irr
AJames
  • 81
  • 1
  • 1
  • 10

1 Answers1

1

From the docs, the function numpy_financial.irr() consumes a one-dimensional array-like object. However, you are passing flows_by_year.values which is two-dimensional. Use np.squeeze(flows_by_year.values) or flows_by_year['sum'] instead.

Demo:

In [193]: import pandas as pd
     ...: import numpy_financial as npf
     ...: import numpy as np

In [194]: data = np.asarray([(2017, -3660000.0),
     ...:                    (2018, -5520000.0),
     ...:                    (2019, -2460213.0),
     ...:                    (2020, 1600000.0)])

In [195]: df = pd.DataFrame(data, columns=['EffectiveDate', 'sum'])

In [196]: flows_by_year = df.set_index('EffectiveDate')

In [197]: flows_by_year.values
Out[197]: 
array([[-3660000.],
       [-5520000.],
       [-2460213.],
       [ 1600000.]])

In [198]: np.squeeze(flows_by_year.values)
Out[198]: array([-3660000., -5520000., -2460213.,  1600000.])

In [199]: flows_by_year['sum']
Out[199]: 
EffectiveDate
2017.0   -3660000.0
2018.0   -5520000.0
2019.0   -2460213.0
2020.0    1600000.0
Name: sum, dtype: float64

In [200]: round(npf.irr(np.squeeze(flows_by_year.values)), 4) * 100
Out[200]: -66.25

In [201]: round(npf.irr(flows_by_year['sum']), 4) * 100
Out[201]: -66.25
Community
  • 1
  • 1
Tonechas
  • 13,398
  • 16
  • 46
  • 80
  • That squeeze play is legendary. I'm definitely look more into that for other issues. Thank you so much!!!! – AJames Jun 12 '20 at 13:35