0

I inherited a VBA code that I want to convert to Python.

Think of a suvrival matrix where :

  • each row is a different product
  • each column represent the age of the product

I want to create a survival matrix of zeros, where I then apply the normal distribution (age-life_exp)/sd, where age is the number of the column.

Results : the numbers themselves in my DF lifeleft_2 are good, but not at the right place, dimensions of the results are not OK and lifeleft_2's column indexes are broken.

Question : How do I make SciPy to return the results for each "observation" instead of the whole array in each observation ?

    import pandas as pd
    import numpy as np
    from scipy.stats import norm

    df = pd.DataFrame({'qty'      : [20,  30, 40],
                       'price'    : [100, 50, 20],
                       'life_exp' : [5,   4,  3]})
    df['sd'] = df['life_exp'] / 4

    nrows = df.shape[0]
    ncols = df['life_exp'].max()*2 + 1    # "+1" because 0 = equals the past

    # Survival matrix of zeros where column index = age, used for the normal distribution --> (age-life_exp) / sd
    lifeleft = pd.DataFrame(np.zeros((nrows, ncols)))
    l_cols   = lifeleft.columns

Lifeleft

    # --->  PROBLEM IS HERE  <---
    lifeleft_2 = lifeleft.apply(lambda x: 1 - norm.cdf((col-df['life_exp']) / df['sd']) for col in l_cols)

    display(lifeleft_2)

Lifeleft_2

Phil P.
  • 7
  • 3

1 Answers1

1
life_left = pd.DataFrame(1 - norm.cdf([(c - df['life_exp']) / df['sd'] for c in range(ncols)])).T

Filip
  • 759
  • 4
  • 17