-2

There is a dataframe, with one columns store the discrete values, shown as follows. I would like to create another column storing the normalized values. For instance, for 4050, the corresponding entry will be 4. Are there any efficient ways to do that instead of writing my own function? In Sklearn, are there any functions to generating normalized values?

enter image description here

user288609
  • 12,465
  • 26
  • 85
  • 127
  • Why would the corresponding entry be 4? What kind of normalization are you going for? there are lots of ways to normalize data... – sacuL Aug 13 '18 at 21:47
  • there are around 20 different values, and the range is from 1000 to 9999, so I would like to use every 1000 as a category – user288609 Aug 13 '18 at 21:58

1 Answers1

0

Based on your comment:

there are around 20 different values, and the range is from 1000 to 9999, so I would like to use every 1000 as a category

This isn't really normalization in the strict sense of the word. However, to do that, you can easily use floor division (//):

df['new_column'] = df['values']//1000

For example:

>>> df
   values
0    2021
1    8093
2    9870
3    4508
4    2645
5    1441
6    8888
7    8921
8    7292
9    8571

df['new_column'] = df['values']//1000

>>> df
   values  new_column
0    2021           2
1    8093           8
2    9870           9
3    4508           4
4    2645           2
5    1441           1
6    8888           8
7    8921           8
8    7292           7
9    8571           8
sacuL
  • 49,704
  • 8
  • 81
  • 106
  • Thanks, do you have any suggestions on how to better normalize them? – user288609 Aug 13 '18 at 22:27
  • Like I said, this isn't normalization. I'm not sure what you're going for, but if you wanted to normalize them to have a mean of 0 and a standard deviation of 1, I'd suggest using [`StandardScaler`](http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html), but that's a completely different thing from what you're asking, I think. – sacuL Aug 13 '18 at 22:38