Questions tagged [binning]

binning is the process of grouping data into "bins" used in statistics and data analysis

Binning is the process of grouping data into "bins" used in statistics and data analysis. For details see also Data binning - Wikipedia, the free encyclopedia

684 questions
-1
votes
1 answer

Binning a dataframe when a condition matches in a second dataframe

Good morning all. I want to create a binning column in my main dataframe using data from a second one. Dataframe#1 has "Runner ID" and "Cumulative Distance" columns. Dataframe#2 has "Runner ID", "Section Start" and "Section Name" columns I'm trying…
GusRo
  • 7
  • 4
-1
votes
1 answer

Binning different lengths in R

input1 dput(a1  100 200 + a1  250 270 + a1  333 340 - a2  450 460 +) input2 dput(a1  101 106 + a1  112 117 + a1  258 259 + a1  258 259 + a1  258 259 + a1  258 259 + a1  258 259 + a1  258 259 + a1  258 259 + a1  258 259 + a1  258 259 + a1  260 262…
repinementer
  • 723
  • 3
  • 8
  • 11
-1
votes
1 answer

Python_Cumulative sum based on two conditions

I'm trying to compute the cumulative sum in python based on a two different conditions. As you can see in the attached image, Calculation column would take the same value as the Number column as long as the Cat1 and Cat2 column doesn't change. Once…
GusRo
  • 7
  • 4
-1
votes
2 answers

R Finding standard deviation of other column based on binned groups

Hi I have a dataframe with three columns: a name column, a numerical column A and a numerical column B. What I am trying to do is bin the rows by their values in column A, and then find the standard deviation of each binned group's values in column…
hamhung
  • 53
  • 8
-1
votes
1 answer

Creating a function in python to bin data

I have a data with 1000 rows and 2 columns. One column with CustomerID and other with values. I need to create a function to bin the values in 5 groups. Binning process I need to use is as follows. All the values=1 will be given a score=1. For…
yponde
  • 61
  • 5
-1
votes
1 answer

How is age classified as a categorical variable?

O.K this question is very basic, but i can't get it so need your help. I understand the idea of splitting age to categories. For example : I don't understand how the model knows that the 30< category is before the 31-45 category, why the 31-45…
Amit S
  • 225
  • 6
  • 16
-1
votes
2 answers

Use of lapply to identify which bin a particular value lies in

The data set is this badData <- list(c(296,310), c(330,335), c(350,565)) df <- data.frame(wavelength = seq(300,360,5.008667), reflectance = seq(-1,-61,-5.008667)) df wavelength reflectance 300.0000 -1.000000 305.0087…
ashleych
  • 1,042
  • 8
  • 25
-1
votes
1 answer

Binning Values from 2 variables using the same mean for each bin

How do I bin data from two different groups into bins centered around the same value? As a toy example, A(:,1) = [0.05:0.05:0.80]'; A(:,2) = [ones(7,1); [0.6; 0.6; 0.4]; zeros(6,1)]; B(:,1) = [0.15:0.1:0.95]'; B(:,2) = [ones(4,1); [0.8; 0.8; 0.2];…
BenJHC
  • 89
  • 6
-1
votes
1 answer

R arrange columns as per colSums and bin multiple columns under same category

I have a data frame where 'Earning' is numeric and A,B,C,D,E... are binary vector. Earning A B C D E ...**1000 such binary vector columns** 21 1 0 0 1 1 45 0 0 0 1 1 67 0 0 0 1 1 23 0 0 0 0 1 44 0 0 0 1 1 77 1 1 0 0 1 …
ausworli
  • 479
  • 1
  • 4
  • 10
-1
votes
1 answer

Binning data in Python

I'm working very hard to understand how to bin data in Python. So far I have worked out how to get the edges using: edges = pylab.hist(data, bins=10)[1] I'm not sure if this is the most ideal method, but it worked! Gives me a list of 11 numbers…
user3023715
  • 1,539
  • 2
  • 11
  • 12
-1
votes
1 answer

Binning Imbalanced Data

I have an imbalanced numeric data set that looks like this: . I need to bin the data into 8 bins, however if I set the bins to have equal size, I would get all my data only into two bins and the rest in the middle would be empty. Is there a…
-1
votes
1 answer

How to pass a numeric feature having large number of unique values to Random Forest regression algorithm in PySpark MlLib?

I have a dataset which has a numeric feature column having large number of unique values (of the order of 10,000). I know that when we generate the model for Random Forest regression algorithm in PySpark, we pass a parameter maxBins which should be…
Jason Donnald
  • 2,256
  • 9
  • 36
  • 49
-1
votes
1 answer

counting the number of samples in a numpy array

I have a numpy array of samples, [0, 0, 2.5, -5.0, ...]. In my case all samples are multiples of 2.5. I want tot know how many times each sample occurs. More or less like numpy.hist. In this case something like: [[-5.0, 1], [0, 2], [2.5, 1], ...].
-1
votes
1 answer

How to read data from a URL and count frequencies into bins given in another URL?

I am working on an assignment where I have 2 URLs. The first has 3 columns, the first column is the lower bin boundary, the second is the upper bin boundary and the third is irrelevant for this question but just contains another number. it looks…
-1
votes
1 answer

R - Symmetry with hexbin

I plot two hexbin graphs with R (with package 'hexbin') from data file with two columns gr and ug. The first plot : gr as a function of ug The second plot : ug as a fonction of gr Why aren't they perfectly symmetrical? Thanks in advance
JWheatP
  • 43
  • 5
1 2 3
45
46