I'm trying to fit a curve to get an estimation function of the number of likes a news article have as a function of the articles age. I have a dataset with 5000 datapoints. X-axis is time since publication in hours and y-axis is the number of shares it has.
The limitations of the function is that its not allowed to have a negative derivate (an article will not loose likes when it gets older) and at x=0, y can't be larger than 0.
The only way i managed to get something like this was to use the function a*log(x-1)/log(b)+c and only applying it to the 240 first hours or so. If i take a longer timespan it just becomes a linear estimation where y(0) > 0. I also had to pick away all datapoints above 500 otherwise it gets way to high.
I used the following MATLAB code
modelFunc = @(p,x) p(1) .* log(x-1)/log(p(2)) + p(3);
coef = nlinfit(B(:,2),B(:,1),modelFunc,[1 2 0 0])
But this aproch have several problems that make the result Close to useless:
I asume that it is a logarithmic growth
I randomly picked the cut-off value in time to make the graph "look good"
I randomly picked a cut-off value for the "unnormaly high likes"
So this estimation line is based more on what looks good to my eyes than mathematical calculations...
Any ideas of how to get a good estimation for it?