4

I want to transform continuous values of a dataframe column into discrete values by equivalent partioning. For example, the following is my input.

I want to divide the continuous value in column a into 3 intervals.

Input:

import pandas as pd 
import numpy as np 
df = pd.DataFrame({'a':[1.1, 1.2, 1.3, 2.4, 2.5, 4.1]})

Output:

     a
0  1.1
1  1.2
2  1.3
3  2.4
4  2.5
5  4.1

In column a, the minimum value is 1.1, the maximum value is 4.1, I want to divide it into 3 intervals.

As you see, the size of each interval is equal to (4.1-1.1)/3 = 1.0. So I can regard all the values in the interval of [1.1, 2.1) (bigger or equal to 1.1 and less than 2.1 ) as 0, all the values in the interval of [2.1, 3.1) as 1, and all the values in the interval of [3.1, 4.1] as 2.

So here is my expected result.

Expected:

   a
0  0
1  0
2  0
3  1
4  1
5  2
rosefun
  • 1,797
  • 1
  • 21
  • 33

3 Answers3

11

You can use pd.cut with parameter right = False as:

pd.cut(df.a, bins=3, labels=np.arange(3), right=False)

0    0
1    0
2    0
3    1
4    1
5    2
Name: a, dtype: category
Categories (3, int64): [0 < 1 < 2]

How the binning is done:

pd.cut(df.a, bins=3, right=False)

0      [1.1, 2.1)
1      [1.1, 2.1)
2      [1.1, 2.1)
3      [2.1, 3.1)
4      [2.1, 3.1)
5    [3.1, 4.103)
Name: a, dtype: category
Categories (3, interval[float64]): [[1.1, 2.1) < [2.1, 3.1) < [3.1, 4.103)]
Space Impact
  • 13,085
  • 23
  • 48
2

You can also use np.digitize function and define the variation of bins to assign labels

np.digitize(df.a,np.arange(1.1,4.1,1)) - 1

Out:

array([0, 0, 0, 1, 1, 2], dtype=int64)
Naga kiran
  • 4,528
  • 1
  • 17
  • 31
1

Let us do diff and cumsum

df.a=(~np.isclose(df.a.diff(),0.1)).cumsum()-1 # since it is float I am using close 
df
Out[395]: 
   a
0  0
1  0
2  0
3  1
4  1
5  2
BENY
  • 317,841
  • 20
  • 164
  • 234