I am using Proc HPBIN to split my data into equally-spaced buckets i.e. each bucket has an equal proportion of the total range of the variable.
My issue is when I have extremely skewed data with a large range. Almost all of my datapoints lie in one bucket while there is a couple of observations scattered around the extremes.
I'm wondering if there is a way to force PROC HPBIN to consider the proportion of values in each bin and make sure there is at least e.g. 5% of observations in a bin and to group others?
DATA var1;
DO VAR1 = 1 TO 100;
OUTPUT;
END;
DO VAR1 = 500 TO 505;
OUTPUT;
END;
DO VAR1 = 7000 TO 7015;
OUTPUT;
END;
DO VAR1 = 1000000 TO 1000010;
OUTPUT;
END;
RUN;
/*Use proc hpbin to generate bins of equal width*/
ODS EXCLUDE ALL;
ODS OUTPUT
Mapping = bin_width_results;
PROC HPBIN
DATA=var1
numbin = 15
bucket;
input VAR1 / numbin = 15;
RUN;
ODS EXCLUDE NONE;
Id like to see a way that proc hpbin or other method groups together the bins which are empty and allows at least 5% of proportion per bucket. However, I am not looking to use percentiles in this case (it is another plot on my pdf) because I'd see like to see the spread.