Remove decimal points from pandas qcut intervals (transform intervals to integers)

Question

I have many scores in the column of an object named example. I want to split these scores into deciles and assign the corresponding decile interval to each row. I tried the following:

import random
import pandas as pd
random.seed(420) #blazeit
example = pd.DataFrame({"Score":[random.randrange(350, 1000) for i in range(1000)]})
example["Decile"] = pd.qcut(example["Score"], 10, labels=False) + 1 # Deciles as integer from 1 to 10
example["Decile_interval"] = pd.qcut(example["Score"], 10) # Decile as interval

This gives me the deciles I'm looking for. However, I would like the deciles in example["Decile_interval"] to be integers, not floats. I tried precision=0 but it just shows .0 at the end of each number.

How can I transform the floats in the intervals to integers?

EDIT: As pointet out by @ALollz, doing this will change the decile distribution. However, I am doing this for presentation purposes, so I am not worried by this. Props to @JuanC for realizing this and posting one solution.

Well if you round the endpoints to integers you'll no longer have deciles... So what's more important? — ALollz, Sep 09 '19 at 15:16
@ALollz I'd rather have rounded intervals than exact deciles. An alternative would be to create a new column that simply printed the intervals as integers while keeping the true values in the original column. — Arturo Sbr, Sep 09 '19 at 15:25

score 4 · Accepted Answer · answered Sep 09 '19 at 15:40

4

This is my solution using a simple apply function:

example["Decile_interval"] = example["Decile_interval"].apply(lambda x: pd.Interval(left=int(round(x.left)), right=int(round(x.right))))

answered Sep 09 '19 at 15:40

Massifox

4,369
11
31

score 2 · Answer 2 · answered Sep 09 '19 at 15:27

2

There might be a better solution, but this works:

import numpy as np

int_categories= [pd.Interval(int(np.round(i.left)),int(np.round(i.right))) for i in example.Decile_interval.cat.categories]
example.Decile_interval.cat.categories = int_categories

Output:

0      (350, 418]
1      (680, 740]
2      (606, 680]
3      (740, 798]
4      (418, 474]
5      (418, 474]
.           .

answered Sep 09 '19 at 15:27

Juan C

5,846
2
17
51

The only issue is that `pd.qcut` is slightly smarter and knows to change the left most bin to be 349.999, that way `350` gets grouped and not excluded. – ALollz Sep 09 '19 at 15:28
1

It seems this change is mostly for presentation purposes so total accuracy of the intervals isn't very relevant to OP, but that's a good point nevertheless – Juan C Sep 09 '19 at 15:31
@ALollz That's right, this is more for presentation purposes. This solution works. – Arturo Sbr Sep 09 '19 at 15:42

Remove decimal points from pandas qcut intervals (transform intervals to integers)

2 Answers2