-1

I have 2 datasets, one which contains "jittery"/varying data points, and another dataset which contains the smoothed values. I will demonstrate using an image below: enter image description here

How can I calculate smoothness/the variance of each line. I would like to be able to prove that the orange dataset varies less than the blue dataset, through some mathematical formula.

user3185731
  • 189
  • 1
  • 12
  • 1
    a smoothed curve by definition has less variance than the original data it was derived from! – Mitch Wheat Mar 26 '17 at 16:20
  • I understand that, but I am trying to measure how "smooth" this dataset is. So that I can compare and say dataset X is N% smoother (or has N% less variance) than dataset Y – user3185731 Mar 26 '17 at 18:47
  • 1
    I'm voting to close this question as off-topic because it is about [math.se] instead of programming or software development. – Pang Mar 27 '17 at 04:12

1 Answers1

1

Here is one very simple that has some weaknesses but some strengths as well.

In each dataset, sort the points by the time value (x-coordinate). Then sum over the Euclidean distance between each consecutive pair of points.

This works pretty well if the total spread in time values is the same for each dataset. If this is not the case, you may want to divide the sum-of-distances by the range (maximum minus minimum) of the time values. The smallest "roughness" by this measure is a straight line segment or sequence of segments on the same line. If you want the "roughness" measure to be zero for a line segment you could subtract the length of the line segment between the initial and terminal points from the sum of the distance. Other adjustments could be made for other goals.

Rory Daulton
  • 21,934
  • 6
  • 42
  • 50