0

I wish to transform every column in a dataset so its entries are between 0 and 1 based on the min/max of a column. I get the min/max of each column with df.minmax(col_names) and then want to find the column width col_width = col_max - col_min. With this I wish to transform the data df = (df - col_min)/col_width.

How can I perform this opperation so that every entry is calcualted based on the column in belongs to?

afriedman111
  • 1,925
  • 4
  • 25
  • 42
  • you can use `.apply()` to run function on every column separately. And inside this function you can calculate `col_min`, `col_max`, `col_width` and `(column - col_min)/col_width)` – furas Sep 04 '22 at 22:03
  • 1
    I't not sure but you simply calculate `normalize()` – furas Sep 04 '22 at 22:05

1 Answers1

0

You can do this quite easily with vaex. Consider this example:

import vaex
import vaex.ml

df = vaex.example()

# functional API
df_with_scaled_features = df.ml.minmax_scaler()

# scikit-learn like API
scaler = vaex.ml.MinMaxScaler()
df_with_scaled_features_again = scaler.fit_transform(df)

Of course you can choose to do the maths yourself, but why bother when this is implemented on the vaex side in an efficient way.

Joco
  • 803
  • 4
  • 7