1

I have this issue when trying to use sklearn.preprocessing.MinMaxScaler on a large array and obtaining the scaling parameters to do "redo" the normalization after handling the array for a while.

The issue I have is that after doing my MinMaxScaler.fit_transform(data), where data is a numpy array with shape (8,412719), the scaling parameters obtained with MinMaxScaler.scale_ is just a list with length 412719.

How do I obtain an array with scaling parameters instead? I'm missing 7 columns worth of scaling parameters if I've not misunderstood something.

Boston
  • 13
  • 4
  • you should separate convert the array to a dataframeframe where each column is a feature then create the X columns and the y target and filter. apply the MinMaxScaler to X and assign back to X. you will see the features as numpy array as a list. The classifier can parse the list of a list. – Golden Lion Nov 03 '22 at 14:37
  • @GoldenLion thanks for the response. I feel like this might work, and I'll give it a try, but I have a feeling that there's a simpler solution I'm missing completely. Just for clarification, MinMaxScaler works in such a way that each elements gets its own scaling factor correct? Or does each column get a scaling value? – Boston Nov 03 '22 at 15:25
  • scaler normalizes the data using the min and max values. normalization is a number between 0 and 1 – Golden Lion Nov 03 '22 at 17:27

1 Answers1

0

I build my X dataframe and y target then scaler the X dataframe

df3.dropna(inplace=True)
X_Columns=[column for column in df3.columns if not column in["Target","DateTime","Date","CO2Intensity","ActualWindProduction","ORKWindspeed","ForecastWindProduction"]]
#print(X_Columns)
X=df3[X_Columns]
#print(X)
y=df3["Target"]

scaler=MinMaxScaler()

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20, random_state = 0)

X_train_scaled = scaler.fit_transform(X_train)

classifier = GaussianNB()
classifier.fit(X_train_scaled, y_train)
Golden Lion
  • 3,840
  • 2
  • 26
  • 35
  • I see what you're doing here, and it's really exactly what I want to do. My problem here is that I need to obtain the "non-normalized" values back after doing the train_test_split, so I need the MinMaxScaling.scale_ values to do this, and these are the values which I'm not sure how to obtain correctly. – Boston Nov 03 '22 at 18:16
  • You can normalize the X_train – Golden Lion Nov 03 '22 at 21:30
  • I change when normalization occurs. try this. – Golden Lion Nov 04 '22 at 12:46