2

How do you find abrupt change in an array? For example, if you have following array:

1,3,8,14,58,62,69
In this case, there is a jump from 14 to 58

OR

79,77,68,61,9,3,1
In this case, there is a drop from 61 to 9

In both examples, there are small and big jumps. For example, in 2nd case, there is a small drop from 77 to 68. However, this must be ignored if a larger jump/drop is found. I have following algorithm in my mind but I am not sure if this will cover all possible cases:

ALGO
Iterate over array
Diff (i+1)-i
store first difference in a variable
if next diff is bigger than previous then overwrite 

For the following example, this algo will not work for the following case:

1, 2, 4, 6, 34, 38, 41, 67, 69, 71

There are two jumps in this array. So it should be arranged like

[1, 2, 4, 6], [34, 38, 41], [67, 69, 71]
Twitty
  • 31
  • 3
  • what will be output for 1, 2, 4, 6, 34, 38, 41, 67, 69, 71? `[1, 2, 4, 6], [34, 38, 41], [67, 69, 71]`or, `28` (max jump)?? – Shahid Sep 06 '16 at 05:13
  • The output would be the index/value of start of a jump/drop. For example, [1, 2, 4, 6], [34, 38, 41], [67, 69, 71] will have an output like 6 and 41 – Twitty Sep 06 '16 at 05:16
  • 2
    This sounds a lot like an [edge detection](https://en.wikipedia.org/wiki/Edge_detection) problem, or the 1D analog, [step detection](https://en.wikipedia.org/wiki/Step_detection). –  Sep 06 '16 at 05:17
  • @friendlydog, I couldn't find any Java implementation for this algo. Or simple steps to implement this alog (not the complex math notations). – Twitty Sep 06 '16 at 05:21
  • 4
    how you are defining big jump or big drop? any threshold value? – Shahid Sep 06 '16 at 05:22
  • You first need to define what you consider a jump. Your last example could just as well be `[1, 2, 4, 6], [34], [38], [41], [67, 69, 71]`. Maybe something like: "if the gap is bigger than one/1.5/two standard deviations from the average gap between numbers" – Erwin Bolwidt Sep 06 '16 at 05:25
  • @Shahid, All the values are pretty random. Thus, defining a threshold might be difficult. – Twitty Sep 06 '16 at 05:25
  • @Twitty It's a complex subject, for sure. I don't think you're going to find a "simple" solution that doesn't involve statistics, outside of a library that handles it for you, unless you can simplify your problem down by defining a clear gap size. –  Sep 06 '16 at 05:26
  • @friendlydog, is there any java implementation of step detection? Or could you post an answer with simple (non math) steps how it works? – Twitty Sep 06 '16 at 05:29
  • @Twitty I can't find any, sorry. It's beyond me. And it doesn't seem like you can avoid the math in this problem. Maybe if you provide more info about what your actual data set looks like, or post a link to all your test cases, someone can analyze it and suggest a simpler approach that will work for those cases (and only those cases). But the _general_ problem involves math/statistics. No way around it. –  Sep 06 '16 at 05:43

3 Answers3

2

In the end, this is pure statistics. You have a data set; and you are look for a certain forms of outliers. In that sense, your requirement to detect "abrupt changes" is not very precise.

I think you should step back here; and have a deeper look into the mathematics behind your problem - to come up with clear "semantics" and crisp definitions for your actual problem (for example based on average, deviation, etc.). The wikipedia link I gave above should be a good starting point for that part.

From there on, to get to an Java implementation, you might start looking here.

Community
  • 1
  • 1
GhostCat
  • 137,827
  • 25
  • 176
  • 248
1

I would look into using a Moving Average, this involves looking at an average for the last X ammount of values. Do this based on the change in value (Y1 - Y2). Any large deviations from the average could be seen as a big shift.

However given how small your datasets are a moving average would likely yeild bad results. With such a small sample size it might be better to take an average of all values in the array instead:

double [] nums = new double[] {79,77,68,61,9,3,1};
double [] deltas = new double[nums.length-1];
double advDelta = 0;

for(int i=0;i<nums.length-1;i++) {
    deltas[i] = nums[i+1]-nums[i];
    advDelta += deltas[i] / deltas.length;
}

// search for deltas > average
for(int i=0;i<deltas.length;i++) {
    if(Math.abs(deltas[i]) > Math.abs(advDelta)) {
        System.out.println("Big jump between " + nums[i] + " " + nums[i+1]);
    }
}
ug_
  • 11,267
  • 2
  • 35
  • 52
  • It seems to work on all the examples I provided in the question. I will wait for others to analyze the answer too and make sure this won't fail in any hidden scenario. – Twitty Sep 06 '16 at 05:49
  • 1
    This algorithm is simply trying to find the above average differences. This will fail if the series suddenly includes high differences. Like `1,4,8,13,19,39,60,84,109`. Differences are `3,4,5,6,20,21,24,25`, only jump is at 19 to 39, but it gives every next number result as a jump because their difference is more than the total average. You should try an actual moving average. – 11thdimension Sep 06 '16 at 06:25
0

This problem doesn't have an absolute solution, you'll have to determine thresholds for the context in which the solution is to be applied.

No algorithm can give us the rule for the jump. We as humans are able to determine these changes because we are able to see the entire data at one glance for now. But if data set is large enough then it would be difficult for us to say which jumps are to be considered. For example if on average differences between consecutive numbers are 10 then any difference above that would be considered a jump. However in a large data set there could be differences which are sort of spikes or which start a new normal difference like from 10 to differences suddenly become 100. We will have to decide if we want to get the jumps based on the difference average 10 or 100.

If we are interested in local spike only then it's possible to use moving average as suggested by @ug_

However moving average has to be moving, meaning we maintain a set of local numbers with a fixed set size. On that we calculate the average of the differences and then compare them to the local differences.

However here also we again face the problem to determine the size of the local set. This threshold determines the granularity of the jumps that we capture. A very large set will tend to ignore the closer jumps and a smaller one will tend to provide false positives.

Following a simple solution where you can try setting the thresholds. Local set size in this case is 3, that's the minimum that can be used as it will give us minimum count of differences required that is 2.

public class TestJump {
    public static void main(String[] args) {
        int[] arr = {1, 2, 4, 6, 34, 38, 41, 67, 69, 71};
        //int[] arr = {1,4,8,13,19,39,60,84,109};

        double thresholdDeviation = 50; //percent jump to detect, set for your reuirement
        double thresholdDiff = 3; //Minimum difference between consecutive differences to avoid false positives like 1,2,4

        System.out.println("Started");

        for(int i = 1; i < arr.length - 1; i++) {
            double diffPrev = Math.abs(arr[i] - arr[i-1]);
            double diffNext = Math.abs(arr[i+1] - arr[i]);

            double deviation = Math.abs(diffNext - diffPrev) / diffPrev * 100;

            if(deviation > thresholdDeviation && Math.abs(diffNext - diffPrev) > thresholdDiff) {
                System.out.printf("Abrupt change @ %d: (%d, %d, %d)%n", i, arr[i-1], arr[i], arr[i+1]);
                i++;
            }
            //System.out.println(deviation + " : " + Math.abs(diffNext - diffPrev));
        }

        System.out.println("Finished");
    }
}

Output

Started
Abrupt change @ 3: (4, 6, 34)
Abrupt change @ 6: (38, 41, 67)
Finished

If you're trying to solve a larger problem than just arrays like finding spikes in medical data or images, then you should checkout neural networks.

11thdimension
  • 10,333
  • 4
  • 33
  • 71