I want to use the library optbinning to create tables with all the metrics, but under the assumption that I already have all the bins. I don't want to optimize the binning process, I just want the tables with my current bins. Despite the fact that I've found a "solution", not sure if there's a bug or something I'm missing in the parameters. Here is my example: First, I create a fake dataset:
import random
import pandas as pd
from datetime import datetime, timedelta
import pandas as pd
from optbinning import BinningProcess, OptimalBinning
# Set seed for reproducibility
random.seed(42)
# Generate fake data
data = {
'GB': [random.choice([0, 1]) for _ in range(2000)],
'Period': [(datetime(2021, 1, 1) + timedelta(days=random.randint(0, 731))).strftime("%m/%Y") for _ in range(2000)],
'Age': [random.randint(18, 80) if random.random() > 0.2 else None for _ in range(2000)],
'L6ag': [random.randint(0, 9) if random.random() > 0.2 else None for _ in range(2000)],
'L_3M': [chr(random.randint(65, 90)) if random.random() > 0.2 else None for _ in range(2000)],
'M36m': [random.randint(0, 1000) for _ in range(2000)],
'Balance': [random.randint(0, 100000) for _ in range(2000)]
}
# Create DataFrame
df = pd.DataFrame(data)
df
Then, I want to for example create a table for Age using the following bins: custom_bins = [28, 37, 63, 67]
So, i use the following code:
# Define your custom bins
custom_bins = [28, 37, 63, 67]
# Define the binning object
optb = OptimalBinning(name="Age", dtype="numerical", user_splits=custom_bins)
# Fit the binning object
optb.fit(df["Age"], df["GB"]) # GB is your target variable
optb.binning_table.build()
And I get the following table which miss the first bin (-inf to 28):
If I try using the user_splits_fixed parameter to "force" each value on the bins, the result is even worse
# Define your custom bins
custom_bins = [28, 37, 63, 67]
user_splits_fixed = [True, True, True, True]
# Define the binning object
optb = OptimalBinning(name="Age", dtype="numerical", user_splits=custom_bins, user_splits_fixed=user_splits_fixed)
# Fit the binning object
optb.fit(df["Age"], df["GB"]) # GB is your target variable
optb.binning_table.build()
Any help would be more than appreciated
I would love to get a proper code to produce the table maintaining the original bins provided by the user