7

Supose I have the following DataFrame:

   Area
0  14.68
1  40.54
2  10.82
3  2.31
4  22.3

And I want to categorize that values in range. Like A: [1,10], B: [11,20], C...

   Area
0  B
1  D
2  C
3  A
4  C

How can I do it with Pandas? I tried following code:

bins = pd.IntervalIndex.from_tuples([(0, 11), (11, 20), (20, 50), (50, 100), (100, 500), (500, np.max(df["area"]) + 1)], closed='left')
catDf = pd.cut(df["area"], bins = bins)

But "cut" command just put range values in DataFrame and I want put the categories names instead of range.

EDIT: I tried to pass label to the cut, but nothing changes. EDIT2: To clarify, if the value of "area" have 10.21, so it's in range of [10,20], so it must be labeled like "B" or other label for that range of values.

demo
  • 421
  • 5
  • 22

3 Answers3

4

For me working cat.codes with indexing by converting list a to numpy array:

a = list('ABCDEF')
df['new'] = np.array(a)[pd.cut(df["Area"], bins = bins).cat.codes]
print (df)
     Area new
0   14.68   B
1   40.54   C
2   10.82   A
3    2.31   A
4   22.30   C
5  600.00   F

catDf = pd.Series(np.array(a)[pd.cut(df["Area"], bins = bins).cat.codes], index=df.index)
print (catDf)
0    B
1    C
2    A
3    A
4    C
5    F
dtype: object
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • 1
    Perfect. Thanks you, that's exactly what I want, easy and clean solution. Didn't find the "codes" function. Thank you again – demo Apr 14 '19 at 16:30
2

You can specify the labels like following:

Note not sure which ranges you used:

pd.cut(df.Area, [1,10, 20, 50, 100], labels=['A', 'B', 'C', 'D'])

0    B
1    C
2    B
3    A
4    C
Name: Area, dtype: category
Categories (4, object): [A < B < C < D]
Erfan
  • 40,971
  • 8
  • 66
  • 78
  • I need a label for range of values. Supose that value 10.23 is in range of [10,20], so it's value will be A. All the range values have labels and if the value of the "are" is in range, then it will be transform to A – demo Apr 14 '19 at 16:23
0

Assuming that bins is a global variable, you could do that

   def number_to_bin(number):
        ALPHABETS = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
        for i, bin in enumerate(bins):
            if number >= bin[0] and number <= bin[1]:
                return ALPHABETS[i]

   df["area"] = df["area"].apply(number_to_bin)
Ahmed Ragab
  • 836
  • 5
  • 10
  • It's work if you have fixed values of bins, like 1, 10, 20 and so on. But since I have range of values, this solution doesn't work. – demo Apr 14 '19 at 16:37