How could I get rid of sparky data in a descrete data set, but in a "smoother out" manner?
Take for instance
There are two sparks, at 20000, but the next one at 600 is also considered a spark.
I've managed to get the very high ones to zero, by
a = 2
b = 5
beta_dist = RealDistribution('beta', [a, b])
f(x) = x / 19968
normalized_insertions = [f(i) for i in insertions]
insertions_pairs = [(i, beta_dist.distribution_function(i)) for i in normalized_insertions]
plot_b = beta_dist.plot()
show(list_plot(insertions_pairs)+plot_b)
No idea how to go about the lower ones. The maximul should be reached at 100, perhaps the parameters for the beta distribution need a little more twiddling?
Currently, it looks like this:
If possible, use sage as a reference for your explanations.