There is a dataframe, with one columns store the discrete values, shown as follows. I would like to create another column storing the normalized values. For instance, for 4050
, the corresponding entry will be 4
. Are there any efficient ways to do that instead of writing my own function? In Sklearn, are there any functions to generating normalized values?
Asked
Active
Viewed 91 times
-2

user288609
- 12,465
- 26
- 85
- 127
-
Why would the corresponding entry be 4? What kind of normalization are you going for? there are lots of ways to normalize data... – sacuL Aug 13 '18 at 21:47
-
there are around 20 different values, and the range is from 1000 to 9999, so I would like to use every 1000 as a category – user288609 Aug 13 '18 at 21:58
1 Answers
0
Based on your comment:
there are around 20 different values, and the range is from 1000 to 9999, so I would like to use every 1000 as a category
This isn't really normalization in the strict sense of the word. However, to do that, you can easily use floor division (//
):
df['new_column'] = df['values']//1000
For example:
>>> df
values
0 2021
1 8093
2 9870
3 4508
4 2645
5 1441
6 8888
7 8921
8 7292
9 8571
df['new_column'] = df['values']//1000
>>> df
values new_column
0 2021 2
1 8093 8
2 9870 9
3 4508 4
4 2645 2
5 1441 1
6 8888 8
7 8921 8
8 7292 7
9 8571 8

sacuL
- 49,704
- 8
- 81
- 106
-
Thanks, do you have any suggestions on how to better normalize them? – user288609 Aug 13 '18 at 22:27
-
Like I said, this isn't normalization. I'm not sure what you're going for, but if you wanted to normalize them to have a mean of 0 and a standard deviation of 1, I'd suggest using [`StandardScaler`](http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html), but that's a completely different thing from what you're asking, I think. – sacuL Aug 13 '18 at 22:38