Questions tagged [binning]

binning is the process of grouping data into "bins" used in statistics and data analysis

Binning is the process of grouping data into "bins" used in statistics and data analysis. For details see also Data binning - Wikipedia, the free encyclopedia

684 questions
15
votes
1 answer

Pandas pd.cut() - binning datetime column / series

Attempting to do a bin using pd.cut() but it is fairly elaborate- A collegue sends me multiple files with report dates such as: '03-16-2017 to 03-22-2017' '03-23-2017 to 03-29-2017' '03-30-2017 to 04-05-2017' They are all combined into a single…
Arthur D. Howland
  • 4,363
  • 3
  • 21
  • 31
15
votes
4 answers

pandas qcut not putting equal number of observations into each bin

I have a data frame, from which I can select a column (series) as follows: df: value_rank 275488 90 275490 35 275491 60 275492 23 275493 23 275494 34 275495 75 275496 …
Carl
  • 598
  • 2
  • 11
  • 25
15
votes
3 answers

Howto bin series of float values into histogram in Python?

I have set of value in float (always less than 0). Which I want to bin into histogram, i,e. each bar in histogram contain range of value [0,0.150) The data I have looks like this: 0.000 0.005 0.124 0.000 0.004 0.000 0.111 0.112 Whith my code below…
neversaint
  • 60,904
  • 137
  • 310
  • 477
13
votes
3 answers

Pandas Dataframe - Bin on multiple columns & get statistics on another column

Problem I have a target variable x and some additional variables A and B. I want to calculate averages (and other statistics) of x when certain conditions for A and B are met. A real world example would be to calculate the average air temperature…
Fred S
  • 1,421
  • 6
  • 21
  • 37
13
votes
3 answers

python bin data and return bin midpoint (maybe using pandas.cut and qcut)

Can I make pandas cut/qcut function to return with bin endpoint or bin midpoint instead of a string of bin label? Currently pd.cut(pd.Series(np.arange(11)), bins = 5) 0 (-0.01, 2] 1 (-0.01, 2] 2 (-0.01, 2] 3 (2, 4] 4 (2,…
jf328
  • 6,841
  • 10
  • 58
  • 82
13
votes
2 answers

assigning points to bins

What is a good way to bin numerical values into a certain range? For example, suppose I have a list of values and I want to bin them into N bins by their range. Right now, I do something like this: from scipy import * num_bins = 3 # number of bins…
user248237
12
votes
3 answers

Pandas - Group/bins of data per longitude/latitude

I have a bunch of geographical data as below. I would like to group the data by bins of .2 degrees in longitude AND .2 degree in latitude. While it is trivial to do for either latitude or longitude, what is the most appropriate of doing this for…
tog
  • 887
  • 1
  • 12
  • 22
12
votes
3 answers

Numpy rebinning a 2D array

I am looking for a fast formulation to do a numerical binning of a 2D numpy array. By binning I mean calculate submatrix averages or cumulative values. For ex. x = numpy.arange(16).reshape(4, 4) would have been splitted in 4 submatrix of 2x2 each…
user1187727
  • 409
  • 2
  • 9
  • 19
12
votes
4 answers

Python: how to make an histogram with equally *sized* bins

I have a set of data, and want to make an histogram of it. I need the bins to have the same size, by which I mean that they must contain the same number of objects, rather than the more common (numpy.histogram) problem of having equally spaced…
astabada
  • 1,029
  • 4
  • 13
  • 26
11
votes
1 answer

How does cut with breaks work in R

I am trying to understand how cut divides and creates intervals; tried ?cut but can't be able to figure out how cut in r works. Here is my problem: set.seed(111) data1 <- seq(1,10, by=1) data1 [1] 1 2 3 4 5 6 7 8 9 10 data1cut<-…
deepseefan
  • 3,701
  • 3
  • 18
  • 31
11
votes
4 answers

Binning an array in javascript for a histogram

I have below array in Javascript which I need to bin into 20 buckets. The data values are between 0 and 1, so the bin size would be .05. I feel like there should be a function out there that takes two arguments, an array and a bin size, but I cannot…
NodeJS_dev
  • 231
  • 1
  • 5
  • 12
11
votes
1 answer

Group/bin/bucket data in R and get count per bucket and sum of values per bucket

I wish to bucket/group/bin data : C1 C2 C3 49488.01172 0.0512 54000 268221.1563 0.0128 34399 34775.96094 0.0128 54444 13046.98047 0.07241 61000 2121699.75 0.00453 78921 71155.09375 0.0181 …
Freewill
  • 413
  • 2
  • 6
  • 18
11
votes
3 answers

Binning a numeric variable

I have a vector X that contains positive numbers that I want to bin/discretize. For this vector, I want the numbers [0, 10) to show up just as they exist in the vector, but numbers [10,∞) to be 10+. I'm using: x <-…
mcpeterson
  • 4,894
  • 4
  • 24
  • 24
11
votes
1 answer

Extending numpy.digitize to multi-dimensional data

I have a set of large arrays (about 6 million elements each) that I want to basically perform a np.digitize but over multiple axes. I am looking for some suggestions on both how to effectively do this but also on how to store the results. I need…
Brian Larsen
  • 1,740
  • 16
  • 28
10
votes
4 answers

Mathematica fast 2D binning algorithm

I am having some trouble developing a suitably fast binning algorithm in Mathematica. I have a large (~100k elements) data set of the form T={{x1,y1,z1},{x2,y2,z2},....} and I want to bin it into a 2D array of around 100x100 bins, with the bin…
Ben Farmer
  • 2,387
  • 1
  • 25
  • 44
1
2
3
45 46