generate normalized discrete values for feature engineering

Question

There is a dataframe, with one columns store the discrete values, shown as follows. I would like to create another column storing the normalized values. For instance, for 4050, the corresponding entry will be 4. Are there any efficient ways to do that instead of writing my own function? In Sklearn, are there any functions to generating normalized values?

Why would the corresponding entry be 4? What kind of normalization are you going for? there are lots of ways to normalize data... — sacuL, Aug 13 '18 at 21:47
there are around 20 different values, and the range is from 1000 to 9999, so I would like to use every 1000 as a category — user288609, Aug 13 '18 at 21:58

score 0 · Accepted Answer · answered Aug 13 '18 at 22:05

0

Based on your comment:

there are around 20 different values, and the range is from 1000 to 9999, so I would like to use every 1000 as a category

This isn't really normalization in the strict sense of the word. However, to do that, you can easily use floor division (//):

df['new_column'] = df['values']//1000

For example:

>>> df
   values
0    2021
1    8093
2    9870
3    4508
4    2645
5    1441
6    8888
7    8921
8    7292
9    8571

df['new_column'] = df['values']//1000

>>> df
   values  new_column
0    2021           2
1    8093           8
2    9870           9
3    4508           4
4    2645           2
5    1441           1
6    8888           8
7    8921           8
8    7292           7
9    8571           8

answered Aug 13 '18 at 22:05

sacuL

49,704
8
81
106

Thanks, do you have any suggestions on how to better normalize them? – user288609 Aug 13 '18 at 22:27
Like I said, this isn't normalization. I'm not sure what you're going for, but if you wanted to normalize them to have a mean of 0 and a standard deviation of 1, I'd suggest using [`StandardScaler`](http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html), but that's a completely different thing from what you're asking, I think. – sacuL Aug 13 '18 at 22:38

generate normalized discrete values for feature engineering

1 Answers1