I have X,Y data which i would like to bin according to X values. However, I would like to determine the optimal number of X bins that satisfy a condition based on the resulting bin intervals and average Y of each bin. For example if i have
X=[2,3,4,5,6,7,8,9,10]
Y=[120,140,143,124,150,140,180,190,200]
I would like to determine the best number of X bins that will satisfy this condition: Average of Y bin/(8* width of X bin) should be above 20, but as close as possible to 20. The bins should also be integers e.g., [1,2,..]. I am currently using:
bin_means, bin_edges, binnumber = binned_statistic(X, Y, statistic='mean', bins=bins)
with bins being pre-defined. However, i would like an algorithim that can determine the optimal bins for me before using this. One can easily determine it for a small data but for hundreds of points it becomes time consuming.
Thank you