1

I want to use the library to create tables with all the metrics, but under the assumption that I already have all the bins. I don't want to optimize the binning process, I just want the tables with my current bins. Despite the fact that I've found a "solution", not sure if there's a bug or something I'm missing in the parameters. Here is my example: First, I create a fake dataset:

import random
import pandas as pd
from datetime import datetime, timedelta
import pandas as pd
from optbinning import BinningProcess, OptimalBinning


# Set seed for reproducibility
random.seed(42)

# Generate fake data
data = {
    'GB': [random.choice([0, 1]) for _ in range(2000)],
    'Period': [(datetime(2021, 1, 1) + timedelta(days=random.randint(0, 731))).strftime("%m/%Y") for _ in range(2000)],
    'Age': [random.randint(18, 80) if random.random() > 0.2 else None for _ in range(2000)],
    'L6ag': [random.randint(0, 9) if random.random() > 0.2 else None for _ in range(2000)],
    'L_3M': [chr(random.randint(65, 90)) if random.random() > 0.2 else None for _ in range(2000)],
    'M36m': [random.randint(0, 1000) for _ in range(2000)],
    'Balance': [random.randint(0, 100000) for _ in range(2000)]
}

# Create DataFrame
df = pd.DataFrame(data)
df

Then, I want to for example create a table for Age using the following bins: custom_bins = [28, 37, 63, 67]
So, i use the following code:
# Define your custom bins
custom_bins = [28, 37, 63, 67]

# Define the binning object
optb = OptimalBinning(name="Age", dtype="numerical", user_splits=custom_bins)

# Fit the binning object
optb.fit(df["Age"], df["GB"]) # GB is your target variable

optb.binning_table.build()

And I get the following table which miss the first bin (-inf to 28): enter image description here

If I try using the user_splits_fixed parameter to "force" each value on the bins, the result is even worse

# Define your custom bins
custom_bins = [28, 37, 63, 67]
user_splits_fixed = [True, True,  True, True] 

# Define the binning object
optb = OptimalBinning(name="Age", dtype="numerical", user_splits=custom_bins, user_splits_fixed=user_splits_fixed)

# Fit the binning object
optb.fit(df["Age"], df["GB"]) # GB is your target variable

optb.binning_table.build()

enter image description here

Any help would be more than appreciated

I would love to get a proper code to produce the table maintaining the original bins provided by the user

Mario
  • 1,631
  • 2
  • 21
  • 51
TomasLeon
  • 11
  • 1
  • Got the answer, monotonic_trend should be set to None, so the function doesn't perform any further calculation :) – TomasLeon Jun 02 '23 at 17:52

0 Answers0