Questions tagged [binning]

binning is the process of grouping data into "bins" used in statistics and data analysis

Binning is the process of grouping data into "bins" used in statistics and data analysis. For details see also Data binning - Wikipedia, the free encyclopedia

684 questions
6
votes
1 answer

Create binned variable from results of class interval determination

I want to create a binned variable out of a continuous variable. I want 10 bins, with break points set from whatever results from a jenks classification. How do I assign each value to one of these 10 bins? # dataframe w/ values (AllwdAmt) df <-…
NiuBiBang
  • 628
  • 1
  • 15
  • 30
6
votes
2 answers

pandas binning a list based on qcut of another list

say I have a list: a = [3, 5, 1, 1, 3, 2, 4, 1, 6, 4, 8] and a sub list of a: b = [5, 2, 6, 8] I'd like to obtain bins by pd.qcut(a,2) and count number of values in each bin for list b. That is In[84]: pd.qcut(a,2) Out[84]: Categorical: [[1, 3],…
user2921752
  • 579
  • 5
  • 14
5
votes
2 answers

Numpy histogram data: Why is the length of bins vector longer than the histogram values vector?

There are two outputs to numpy.histogram: hist: values of the histogram bin_edges: Return the bin edges (length(hist)+1) both are vectors but in the example below, the second vector is of length 101, which is 1 higher than the first vector, which…
develarist
  • 1,224
  • 1
  • 13
  • 34
5
votes
1 answer

How to round off labels in cut function in R

I am trying to round off my labels from cut function in R using dig.lab argument. I have given value as 20 but I get lot of decimal places after number in labels e.g. (114126.30000000001746,5248999] . And if I reduce the value of dig.lab to 5, the…
rapunzel
  • 53
  • 1
  • 5
5
votes
2 answers

How to bin column of floats with pandas

This code was working until I upgrade my python 2.x to 3.x. I have a df consisting of 3 columns ipk1, ipk2, ipk3. ipk1, ipk2, ipk3 consisting of float numbers 0 - 4.0, I would like to bin them into string. The data looks something like this: …
yuliansen
  • 470
  • 2
  • 14
  • 29
5
votes
2 answers

Pandas DataFrame: mean of column B values within column A windows

If I have a pandas DataFrame in Python such as follows: import numpy as np import pandas as pd a = np.random.uniform(0,10,20) b = np.random.uniform(0,1,20) data = np.vstack([a,b]).T df = pd.DataFrame(data) df.columns =…
user8188120
  • 883
  • 1
  • 15
  • 30
5
votes
2 answers

Python: Binning based on 2 columns in Pandas

Looking for a quick and elegant way to bin based on 2 columns in Pandas. Here's my data frame filename height width 0 shopfronts_23092017_3_285.jpg 750.0 560.0 1 shopfronts_200.jpg …
bsrcube
  • 83
  • 1
  • 7
5
votes
1 answer

binning data via DecisionTreeClassifier sklearn?

suppose I have a data set: X y 20 0 22 0 24 1 27 0 30 1 40 1 20 0 ... I try to discretize X into few bins by minimizing the entropy. so I did the following: clf =…
user6396
  • 1,832
  • 6
  • 23
  • 38
5
votes
1 answer

How do I efficiently bin values into overlapping bins using Pandas?

I would like to bin all the values from a column of type float into bins that are overlapping. The resulting column could be a series of 1-D vectors with bools - one vector for each value from the original column. The resulting vectors contain True…
Sergey Zakharov
  • 1,493
  • 3
  • 21
  • 40
5
votes
1 answer

Creating a Bin for NaN values

I am trying to do some data analysis and the idea is to use the autobinning command to create optimal bins, calculate the WOE (Weight of evidence) value for each bin and then replace the original values that belong to each bin with the respective…
Man Gou
  • 113
  • 1
  • 5
5
votes
0 answers

The Thresholded Histogram in Python -- Force each bin to have at least N objects

In the typical histogram created with Numpy.histogram or matplotlib.pyplot.hist, the bins are of uniform width or the user specifies his/her own bin edges. There are lots of choices about optimal bin width -- say sqrt(sample size). Sometimes, there…
quantumflash
  • 691
  • 2
  • 5
  • 16
5
votes
1 answer

bin 3d points into 3d bins in python

How can I bin 3d points into 3d bins? Is there a multi dimensional version for np.digitize? I can use np.digitize separately for each dimension, like here. Is there a better solution? Thanks!
Noam Peled
  • 4,484
  • 5
  • 43
  • 48
5
votes
1 answer

pandas - add a column with value based on exisitng one (bins, qcut)

I am slowly moving from R to python + pandas, and I am facing a problem I cannot solve... I need to discretize values from one column, by assigning them to bins and adding a column with those bin names to original DataFrame. I am trying to use…
Paweł Rumian
  • 3,676
  • 3
  • 21
  • 27
4
votes
2 answers

Selecting between duplicate data in a data frame

Earlier I asked a question about extracting duplicate lines from a data frame. I now need to run a script to decide which of these duplicates to keep in my final data set. Duplicate entries in this data set have the same 'Assay' and 'Sample'…
Sam Globus
  • 585
  • 2
  • 5
  • 17
4
votes
1 answer

Inaccurate mapping of gradient fill colours with bin counts in `geom_hexbin` in `ggplot2`

I am trying to plot a binned scatter plot as below using ggplot2. library(ggplot2) bks = seq(from = 0, to = 10000, by = 1000) d <- ggplot(diamonds, aes(carat, price)) + theme_bw() d + geom_point(alpha = 0.01) When I use geom_hexbin, the counts in…
Crops
  • 5,024
  • 5
  • 38
  • 65