I'm using a Multiple Imputer from sklearn library to impute some missing values from rain datasets, containing the rain stations and the rain data (each station a column, and the index are DateTime). I was able to run the IterativeImputer and get an output with all my missing values filled. The problem is that the output contains negative values. It's possible to change de min_value that he imputes, but it sets a unique value for all the columns. I wanna set a min_value based on the minimal value for each column before the imputation. There is a response here in Stack for that answer, but I've no clue how to do it.
The code I'm using:
import pandas as pd
import numpy as np
from sklearn.experimental import enable_iterative_imputer
from sklearn.impute import IterativeImputer
from sklearn.compose import make_column_transformer
from sklearn.compose import make_column_selector
#Babitonga's region stations
babi_ana = pd.read_csv(all_csv_files[0]).set_index("Time") #Here a read the csv data
# Transforming my index to datetime
babi_ana.index = pd.to_datetime(babi_ana.index)
mask = (babi_ana.index > ini1) & (babi_ana.index <= fim1) #Selecting the date range
babi_ana1 = babi_ana.loc[mask]
# Applying the imputer
imputer_data = IterativeImputer(random_state = 0,skip_complete=True,sample_posterior=True, max_iter = 10, missing_values = np.nan)
data = babi_ana1
minimum = data.iloc[:,:].min(axis=0) #No negative values from the original
imputer_data.fit(data.iloc[:,:].values)
data_imputed = imputer_data.transform(data.iloc[:,:].values)
# Here I realize the output has negative values
data_imputed = pd.DataFrame(data_imputed)
minimun_after = data_imputed.iloc[:,:].min(axis=0) #several negative values, except for 2 stations
I wanna be able to use the min_value
and max_value
based on the max and min from each station before the imputation, like this:
max_imputer = data.iloc[:,:].max(axis = 0)
min_imputer = data.iloc[:,:].min(axis = 0)