0

I prefer to use matrix multiplication for coding, because it's so much more efficient than iterating, but curious on how to do this if the dimensions are different.

I have two different dataframes

A:

Orig_vintage
Q12018 185
Q22018. 200

and B:

default_month 1 2 3
orig_vintage
Q12018 0 25 35
Q22018 0 15 45
Q32018 0 35 65

and I'm trying to divide A through columns of B, so the B dataframe becomes (note I've rounded random percentages):

default_month 1 2 3
orig_vintage
Q12018 0 .03 .04
Q22018 0 .04 .05
Q32018 0 .06 .07

But bottom line want to divide the monthly defaults by the total origination figure to get to a monthly default %.

merv
  • 67,214
  • 13
  • 180
  • 245
Benloper
  • 448
  • 4
  • 13

1 Answers1

0
import pandas as pd
import io

df1 = pd.read_csv(
    io.StringIO("""Orig_vintage,Unnamed: 1\nQ12018,185\nQ22018,200\n"""), sep=","
)

df2 = pd.read_csv(
    io.StringIO(
        """default_month,1,2,3\nQ12018,0.0,25.0,35.0\nQ22018,0.0,15.0,45.0\nQ32018,0.0,35.0,65.0\n"""
    ),
    sep=",",
)

df1.set_index("Orig_vintage").join(df2.set_index("default_month"), how="right").pipe(
    lambda d: d.div(d["Unnamed: 1"].fillna(d["Unnamed: 1"].sum()), axis=0)
)
default_month Unnamed: 1 1 2 3
Q12018 1 0 0.135135 0.189189
Q22018 1 0 0.075 0.225
Q32018 nan 0 0.0909091 0.168831
Rob Raymond
  • 29,118
  • 3
  • 14
  • 30