4

I am a python beginner and I try to use the following code from this source: Portfolio rebalancing with bandwidth method in python

The code works well so far.

The problem is that if I want to call the function not as usual like rebalance(df, tol), but from a certain location in the dataframe on, like: rebalance(df[500:], tol), I get the following error:

AttributeError: 'DataFrame' object has no attribute 'colmap'. So my question is: how do I have to adjust the code in order to make this possible?

Here is the code:


import datetime as DT
import numpy as np
import pandas as pd
import pandas.io.data as PID

def setup_df():
    df1 = PID.get_data_yahoo("IBM", 
                             start=DT.datetime(1970, 1, 1), 
                             end=DT.datetime.today())
    df1.rename(columns={'Adj Close': 'ibm'}, inplace=True)

    df2 = PID.get_data_yahoo("F", 
                             start=DT.datetime(1970, 1, 1), 
                             end=DT.datetime.today())
    df2.rename(columns={'Adj Close': 'ford'}, inplace=True)

    df = df1.join(df2.ford, how='inner')
    df = df[['ibm', 'ford']]
    df['sh ibm'] = 0
    df['sh ford'] = 0
    df['ibm value'] = 0
    df['ford value'] = 0
    df['ratio'] = 0
    # This is useful in conjunction with iloc for referencing column names by
    # index number
    df.colmap = dict([(col, i) for i,col in enumerate(df.columns)])
    return df

def invest(df, i, amount):
    """
    Invest amount dollars evenly between ibm and ford
    starting at ordinal index i.
    This modifies df.
    """
    c = df.colmap
    halfvalue = amount/2
    df.iloc[i:, c['sh ibm']] = halfvalue / df.iloc[i, c['ibm']]
    df.iloc[i:, c['sh ford']] = halfvalue / df.iloc[i, c['ford']]

    df.iloc[i:, c['ibm value']] = (
        df.iloc[i:, c['ibm']] * df.iloc[i:, c['sh ibm']])
    df.iloc[i:, c['ford value']] = (
        df.iloc[i:, c['ford']] * df.iloc[i:, c['sh ford']])
    df.iloc[i:, c['ratio']] = (
        df.iloc[i:, c['ibm value']] / df.iloc[i:, c['ford value']])

def rebalance(df, tol):
    """
    Rebalance df whenever the ratio falls outside the tolerance range.
    This modifies df.
    """
    i = 0
    amount = 100
    c = df.colmap
    while True:
        invest(df, i, amount)
        mask = (df['ratio'] >= 1+tol) | (df['ratio'] <= 1-tol)
        # ignore prior locations where the ratio falls outside tol range
        mask[:i] = False
        try:
            # Move i one index past the first index where mask is True
            # Note that this means the ratio at i will remain outside tol range
            i = np.where(mask)[0][0] + 1
        except IndexError:
            break
        amount = (df.iloc[i, c['ibm value']] + df.iloc[i, c['ford value']])
    return df

df = setup_df()
tol = 0.05 #setting the bandwidth tolerance
rebalance(df, tol)

df['portfolio value'] = df['ibm value'] + df['ford value']
df["ibm_weight"] = df['ibm value']/df['portfolio value']
df["ford_weight"] = df['ford value']/df['portfolio value']

print df['ibm_weight'].min()
print df['ibm_weight'].max()
print df['ford_weight'].min()
print df['ford_weight'].max()

# This shows the rows which trigger rebalancing
mask = (df['ratio'] >= 1+tol) | (df['ratio'] <= 1-tol)
print(df.loc[mask])
Community
  • 1
  • 1

1 Answers1

4

The problem you encountered is due to a poor design decision on my part. colmap is an attribute defined on df in setup_df:

df.colmap = dict([(col, i) for i,col in enumerate(df.columns)])

It is not a standard attribute of a DataFrame.

df[500:] returns a new DataFrame which is generated by copying data from df into the new DataFrame. Since colmap is not a standard attribute, it is not copied into the new DataFrame.

To call rebalance on a DataFrame other than the one returned by setup_df, replace c = df.colmap with

c = dict([(col, j) for j,col in enumerate(df.columns)])

I've made this change in the original post as well.

PS. In the other question, I had chosen to define colmap on df itself so that this dict would not have to be recomputed with every call to rebalance and invest.

Your question shows me that this minor optimization is not worth making these functions so dependent on the specialness of the DataFrame returned by setup_df.


There is a second problem you will encounter using rebalance(df[500:], tol):

Since df[500:] returns a copy of a portion of df, rebalance(df[500:], tol) will modify this copy and not the original df. If the object, df[500:], has no reference outside of rebalance(df[500:], tol), it will be garbage collected after the call to rebalance is completed. So the entire computation would be lost. Therefore rebalance(df[500:], tol) is not useful.

Instead, you could modify rebalance to accept i as a parameter:

def rebalance(df, tol, i=0):
    """
    Rebalance df whenever the ratio falls outside the tolerance range.
    This modifies df.
    """
    c = dict([(col, j) for j, col in enumerate(df.columns)])
    while True:
        mask = (df['ratio'] >= 1+tol) | (df['ratio'] <= 1-tol)
        # ignore prior locations where the ratio falls outside tol range
        mask[:i] = False
        try:
            # Move i one index past the first index where mask is True
            # Note that this means the ratio at i will remain outside tol range
            i = np.where(mask)[0][0] + 1
        except IndexError:
            break
        amount = (df.iloc[i, c['ibm value']] + df.iloc[i, c['ford value']])
        invest(df, i, amount)
    return df

Then you can rebalance df starting at the 500th row using

rebalance(df, tol, i=500)

Note that this finds the first row on or after i=500 that needs rebalancing. It does not necessarily rebalance at i=500 itself. This allows you to call rebalance(df, tol, i) for arbitrary i without having to determine in advance if rebalancing is required on row i.

Community
  • 1
  • 1
unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677