Questions tagged [binning]

binning is the process of grouping data into "bins" used in statistics and data analysis

Binning is the process of grouping data into "bins" used in statistics and data analysis. For details see also Data binning - Wikipedia, the free encyclopedia

684 questions
10
votes
2 answers

How to Plot a Pre-Binned Histogram In R

I have a pre-binned frequency table for a rather large dataset. That is, a single column vector of bins and a single column vector of counts associated with those bins. I'd like R to plot a histogram of this data by doing further binning and summing…
Jacob
  • 161
  • 1
  • 5
10
votes
1 answer

Matplotlib: How to make a histogram with bins of equal area?

Given some list of numbers following some arbitrary distribution, how can I define bin positions for matplotlib.pyplot.hist() so that the area in each bin is equal to (or close to) some constant area, A? The area should be calculated by multiplying…
wrkyle
  • 529
  • 1
  • 13
  • 36
10
votes
2 answers

Two-dimensional np.digitize

I have two-dimensional data and I have a bunch of two-dimensional bins generated with scipy.stats.binned_statistic_2d. For each data point, I want the index of the bin it occupies. This is exactly what np.digitize is for, but as far as I can…
Alex
  • 302
  • 3
  • 16
8
votes
2 answers

Python/Pandas Binning Data Timedelta

I have a DataFrame with two columns userID duration 0 DSm7ysk 03:08:49 1 no51CdJ 00:35:50 2 ... with 'duration' having type timedelta. I have tried using bins = [dt.timedelta(minutes = 0), dt.timedelta(minutes = …
cmf05
  • 401
  • 6
  • 15
8
votes
2 answers

How to use the for loop with function needing for a string field?

I am using the smbinning R package to compute the variables information value included in my dataset. The function smbinning() is pretty simple and it has to be used as follows: result = smbinning(df= dataframe, y= "target_variable",…
QuantumGorilla
  • 583
  • 2
  • 10
  • 25
8
votes
2 answers

R: creating a categorical variable from a numerical variable and custom/open-ended/single-valued intervals

I often find myself trying to create a categorical variable from a numerical variable + a user-provided set of ranges. For instance, say that I have a data.frame with a numeric variable df$V and would like to create a new variable df$VCAT such…
Berk U.
  • 7,018
  • 6
  • 44
  • 69
8
votes
3 answers

Binning data in R

I have a vector with around 4000 values. I would just need to bin it into 60 equal intervals for which I would then have to calculate the median (for each of the bins). v<-c(1:4000) V is really just a vector. I read about cut but that needs me to…
user3419669
  • 293
  • 2
  • 4
  • 11
7
votes
2 answers

Pandas pd.cut on Timestamps - "ValueError: bins must increase monotonically"

I am trying to split time series data into labelled segments like this: import pandas as pd import numpy as np # Create example DataFrame of stock values df = pd.DataFrame({ 'ticker':np.repeat( ['aapl','goog','yhoo','msft'], 25 ), …
pyjamas
  • 4,608
  • 5
  • 38
  • 70
7
votes
4 answers

numpy.digitize returns values out of range?

I am using the following code to digitize an array into 16 bins: numpy.digitize(array, bins=numpy.histogram(array, bins=16)[1]) I expect that the output is in the range [1, 16], since there are 16 bins. However, one of the values in the returned…
sandesh247
  • 1,658
  • 1
  • 18
  • 24
7
votes
2 answers

Split an array into bins of equal numbers

I have an array (not sorted) of N elements. I'd like to keep the original order of N, but instead of the actual elements, I'd like them to have their bin numbers, where N is split into m bins of equal (if N is divisible by m) or nearly equal (N not…
max_max_mir
  • 1,494
  • 3
  • 20
  • 36
7
votes
2 answers

What is the fastest way to count elements in an array?

In my models, one of the most repeated tasks to be done is counting the number of each element within an array. The counting is from a closed set, so I know there are X types of elements, and all or some of them populate the array, along with zeros…
EBH
  • 10,350
  • 3
  • 34
  • 59
7
votes
1 answer

Ternary heatmap in R

I'm trying to come up with a way of plotting a ternary heatmap using R. I think ggtern should be able todo the trick, but I don't know how to do a binning function like stat_bin in vanilla ggplot2. Here's What I have so…
phildeutsch
  • 683
  • 1
  • 8
  • 18
7
votes
2 answers

Bin data by (x,y) and summarize

These are the first 10 lines of a huge files I have: (Note that there is only one user in these 10 lines but I've got thousands of users) dput(testd) structure(list(user = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L ), otime = structure(c(10L, 9L, 8L,…
unixsnob
  • 1,685
  • 2
  • 19
  • 45
7
votes
2 answers

2D and 3D Scatter Histograms from arrays in Python

have you any idea, how I can bin 3 arrays to a histogram. My arrays look like Temperature = [4, 3, 1, 4, 6, 7, 8, 3, 1] Radius = [0, 2, 3, 4, 0, 1, 2, 10, 7] Density = [1, 10, 2, 24, 7, 10, 21, 102,…
Christian
  • 739
  • 1
  • 5
  • 15
6
votes
1 answer

plt.hist() vs np.histogram() - unexpected results

The following lines a1, b1, _ = plt.hist(df['y'], bins='auto') a2, b2 = np.histogram(df['y'], bins='auto') print(a1 == a2) print(b1 == b2) equate to all values of a1 being equal to those of a2 and the same for b1 and b2 I then create a plot using…
KOB
  • 4,084
  • 9
  • 44
  • 88
1 2
3
45 46