Algorithm to smooth a curve while keeping the area under it constant

Question

Consider a discrete curve defined by the points (x1,y1), (x2,y2), (x3,y3), ... ,(xn,yn)

Define a constant SUM = y1+y2+y3+...+yn. Say we change the value of some k number of y points (increase or decrease) such that the total sum of these changed points is less than or equal to the constant SUM.

What would be the best possible manner to adjust the other y points given the following two conditions:

The total sum of the y points (y1'+y2'+...+yn') should remain constant ie, SUM.
The curve should retain as much of its original shape as possible.

A simple solution would be to define some delta as follows:

delta = (ym1' + ym2' + ym3' + ... + ymk') - (ym1 + ym2 + ym3 + ... + ymk')

and to distribute this delta over the rest of the points equally. Here ym1' is the value of the modified point after modification and ym1 is the value of the modified point before modification to give delta as the total difference in modification.

However this would not ensure a totally smoothed curve as area near changed points would appear ragged. Does a better solution/algorithm exist for the this problem?

dmuir · Accepted Answer · 2014-12-08T19:11:00.807

I've used the following approach, though it is a bit OTT.

Consider adding d[i] to y[i], to get s[i], the smoothed value. We seek to minimise

S = Sum{ 1<=i<N-1 | sqr( s[i+1]-2*s[i]+s[i-1] } + f*Sum{ 0<=i<N | sqr( d[i])}

The first term is a sum of the squares of (an approximate) second derivative of the curve, and the second term penalises moving away from the original. f is a (positive) constant. A little algebra recasts this as

S = sqr( ||A*d - b||)

where the matrix A has a nice structure, and indeed A'*A is penta-diagonal, which means that the normal equations (ie d = Inv(A'*A)*A'*b) can be solved efficiently. Note that d is computed directly, there is no need to initialise it.

Given the solution d to this problem we can compute the solution d^ to the same problem but with the constraint One'*d = 0 (where One is the vector of all ones) like this

d^ = d - (One'*d/Q) * e
e = Inv(A'*A)*One
Q = One'*e

What value to use for f? Well a simple approach is to try out this procedure on sample curves for various fs and pick a value that looks good. Another approach is to pick a estimate of smoothness, for example the rms of the second derivative, and then a value that should attain, and then search for an f that gives that value. As a general rule, the bigger f is the less smooth the smoothed curve will be.

Some motivation for all this. The aim is to find a 'smooth' curve 'close' to a given one. For this we need a measure of smoothness (the first term in S) and a measure of closeness (the second term. Why these measures? Well, each are easy to compute, and each are quadratic in the variables (the d[]); this will mean that the problem becomes an instance of linear least squares for which there are efficient algorithms available. Moreover each term in each sum depends on nearby values of the variables, which will in turn mean that the 'inverse covariance' (A'*A) will have a banded structure and so the least squares problem can be solved efficiently. Why introduce f? Well, if we didn't have f (or set it to 0) we could minimise S by setting d[i] = -y[i], getting a perfectly smooth curve s[] = 0, which has nothing to do with the y curve. On the other hand if f is gigantic, then to minimise s we should concentrate on the second term, and set d[i] = 0, and our 'smoothed' curve is just the original. So it's reasonable to suppose that as we vary f, the corresponding solutions will vary between being very smooth but far from y (small f) and being close to y but a bit rough (large f).

It's often said that the normal equations, whose use I advocate here, are a bad way to solve least squares problems, and this is generally true. However with 'nice' banded systems -- like the one here -- the loss of stability through using the normal equations is not so great, while the gain in speed is so great. I've used this approach to smooth curves with many thousands of points in a reasonable time.

To see what A is, consider the case where we had 4 points. Then our expression for S comes down to:

sqr( s[2] - 2*s[1] + s[0]) + sqr( s[3] - 2*s[2] + s[1]) + f*(d[0]*d[0] + .. + d[3]*d[3]).

If we substitute s[i] = y[i] + d[i] in this we get, for example,

s[2] - 2*s[1] + s[0] = d[2]-2*d[1]+d[0] + y[2]-2*y[1]+y[0]

and so we see that for this to be sqr( ||A*d-b||) we should take

A = ( 1 -2  1  0)
    ( 0  1 -2  1)
    ( f  0  0  0)
    ( 0  f  0  0)
    ( 0  0  f  0)
    ( 0  0  0  f)
and
b = ( -(y[2]-2*y[1]+y[0]))
    ( -(y[3]-2*y[2]+y[1]))
    ( 0 )
    ( 0 )
    ( 0 )
    ( 0 )

In an implementation, though, you probably wouldn't want to form A and b, as they are only going to be used to form the normal equation terms, A'*A and A'*b. It would be simpler to accumulate these directly.

I liked this one. It has the concept of minimum energy spline in it. Upvoted it. — fang, Dec 04 '14 at 19:09
I don't understand the solution. Why should we minimize S and what is A? Actually I do understand what is happening but the why of it seems to have dodged me completely. — Sohaib, Dec 05 '14 at 05:22
Nicely explained thanks a lot. Just some more clarity then I'd mark it as correct.What should be the initial value of d? Also what is A? Could you explain a little how you got that second equation `S = sqr( ||A*d - b||)`. — Sohaib, Dec 06 '14 at 16:02
@dmuir If my understanding is correct `y` is the original position of points. s is the smoothed and d is the delta with which we are smoothing. In this whole scenario how do I keep the area under it constant? There seems to be no constraint for that? — Sohaib, Dec 10 '14 at 06:29

score 2 · Answer 2 · edited Dec 04 '14 at 08:05

2

This is a constrained optimization problem. The functional to be minimized is the integrated difference of the original curve and the modified curve. The constraints are the area under the curve and the new locations of the modified points. It is not easy to write such codes on your own. It is better to use some open source optimization codes, like this one: ool.

edited Dec 04 '14 at 08:05

undur_gongor

15,657
5
63
75

answered Dec 04 '14 at 07:37

fang

3,473
1
13
19

That is alright but even using such a library I should have some prototype or algorithm to work with. – Sohaib Dec 04 '14 at 11:08

score 1 · Answer 3 · edited Jun 20 '20 at 09:12

what about to keep the same dynamic range?

compute original min0,max0 y-values
smooth y-values
compute new min1,max1 y-values
linear interpolate all values to match original min max y
```
 y=min1+(y-min1)*(max0-min0)/(max1-min1)
```

that is it

Not sure for the area but this should keep the shape much closer to original one. I got this Idea right now while reading your question and now I face similar problem so I try to code it and try right now anyway +1 for the getting me this Idea :)

You can adapt this and combine with the area

So before this compute the area and apply #1..#4 and after that compute new area. Then multiply all values by old_area/new_area ratio. If you have also negative values and not computing absolute area then you have to handle positive and negative areas separately and find multiplication ration to best fit original area for booth at once.

[edit1] some results for constant dynamic range

constant dynamic range

As you can see the shape is slightly shifting to the left. Each image is after applying few hundreds smooth operations. I am thinking of subdivision to local min max intervals to improve this ...

[edit2] have finished the filter for mine own purposes

void advanced_smooth(double *p,int n)
    {
    int i,j,i0,i1;
    double a0,a1,b0,b1,dp,w0,w1;
    double *p0,*p1,*w; int *q;
    if (n<3) return;
    p0=new double[n<<2]; if (p0==NULL) return;
    p1=p0+n;
    w =p1+n;
    q =(int*)((double*)(w+n));
    // compute original min,max
    for (a0=p[0],i=0;i<n;i++) if (a0>p[i]) a0=p[i];
    for (a1=p[0],i=0;i<n;i++) if (a1<p[i]) a1=p[i];
    for (i=0;i<n;i++) p0[i]=p[i];                   // store original values for range restoration
    // compute local min max positions to p1[]
    dp=0.01*(a1-a0);                                // min delta treshold
    // compute first derivation
    p1[0]=0.0; for (i=1;i<n;i++) p1[i]=p[i]-p[i-1];
    for (i=1;i<n-1;i++)                             // eliminate glitches
     if (p1[i]*p1[i-1]<0.0)
      if (p1[i]*p1[i+1]<0.0)
       if (fabs(p1[i])<=dp)
        p1[i]=0.5*(p1[i-1]+p1[i+1]);
    for (i0=1;i0;)                                  // remove zeros from derivation
     for (i0=0,i=0;i<n;i++)
      if (fabs(p1[i])<dp)
        {
             if ((i>  0)&&(fabs(p1[i-1])>=dp)) { i0=1; p1[i]=p1[i-1]; }
        else if ((i<n-1)&&(fabs(p1[i+1])>=dp)) { i0=1; p1[i]=p1[i+1]; }
        }
    // find local min,max to q[]
    q[n-2]=0; q[n-1]=0; for (i=1;i<n-1;i++) if (p1[i]*p1[i-1]<0.0) q[i-1]=1; else q[i-1]=0;
    for (i=0;i<n;i++)                               // set sign as +max,-min
     if ((q[i])&&(p1[i]<-dp)) q[i]=-q[i];           // this shifts smooth curve to the left !!!
    // compute weights
    for (i0=0,i1=1;i1<n;i0=i1,i1++)                 // loop through all local min,max intervals
        {
        for (;(!q[i1])&&(i1<n-1);i1++);             // <i0,i1>
        b0=0.5*(p[i0]+p[i1]);
        b1=fabs(p[i1]-p[i0]);
        if (b1>=1e-6)
        for (b1=0.35/b1,i=i0;i<=i1;i++)             // compute weights bigger near local min max
         w[i]=0.8+(fabs(p[i]-b0)*b1);
        }
    // smooth few times
    for (j=0;j<5;j++)
        {
        for (i=0;i<n  ;i++) p1[i]=p[i];             // store data to avoid shifting by using half filtered data
        for (i=1;i<n-1;i++)                         // FIR smooth filter
            {
            w0=w[i];
            w1=(1.0-w0)*0.5;
            p[i]=(w1*p1[i-1])+(w0*p1[i])+(w1*p1[i+1]);
            }
        for (i=1;i<n-1;i++)                         // avoid local min,max shifting too much
            {
            if (q[i]>0)                             // local max
                {
                if (p[i]<p[i-1]) p[i]=p[i-1];       // can not be lower then neigbours
                if (p[i]<p[i+1]) p[i]=p[i+1];
                }
            if (q[i]<0)                             // local min
                {
                if (p[i]>p[i-1]) p[i]=p[i-1];       // can not be higher then neigbours
                if (p[i]>p[i+1]) p[i]=p[i+1];
                }
            }
        }
    for (i0=0,i1=1;i1<n;i0=i1,i1++)                 // loop through all local min,max intervals
        {
        for (;(!q[i1])&&(i1<n-1);i1++);             // <i0,i1>
        // restore original local min,max
        a0=p0[i0]; b0=p[i0];
        a1=p0[i1]; b1=p[i1];
        if (a0>a1)
            {
            dp=a0; a0=a1; a1=dp;
            dp=b0; b0=b1; b1=dp;
            }
        b1-=b0;
        if (b1>=1e-6)
         for (dp=(a1-a0)/b1,i=i0;i<=i1;i++)
          p[i]=a0+((p[i]-b0)*dp);
        }
    delete[] p0;
    }

so p[n] is the input/output data. There are few things that can be tweaked like:

weights computation (constants 0.8 and 0.35 means weights are <0.8,0.8+0.35/2>)
number of smooth passes (now 5 in the for loop)
the bigger the weight the less the filtering 1.0 means no change

The main Idea behind is:

find local extremes
compute weights for smoothing

so near local extremes are almost none change of the output
smooth
repair dynamic range per each interval between all local extremes

[Notes]

I did also try to restore the area but that is incompatible with mine task because it distorts the shape a lot. So if you really need the area then focus on that and not on the shape. The smoothing causes signal to shrink mostly so after area restoration the shape rise on magnitude.

Actual filter state has none markable side shifting of shape (which was the main goal for me). Some images for more bumpy signal (the original filter was extremly poor on this):

constant dynamic range of local extremes

As you can see no visible signal shape shifting. The local extremes has tendency to create sharp spikes after very heavy smoothing but that was expected

Hope it helps ...

Glad the question helped you. I don't believe this approach would work for me though.The problem is not to smooth but to smooth while keeping some K number of values (those modified) constant. — Sohaib, Dec 04 '14 at 11:13
@Sohaib just finished the filter. it is a bit more complicated now after solving few unexpected things like the shape shifting to one side ... will edit my answer in a moment — Spektre, Dec 04 '14 at 12:16
I still don't think you got my question completely :) 'Say we change the value of some k number of y points (increase or decrease) such that the total sum of these changed points is less than or equal to the constant SUM.' This has to be taken care of. — Sohaib, Dec 05 '14 at 04:53

Algorithm to smooth a curve while keeping the area under it constant

3 Answers3