4

I'm working on a small machine learning theoretical algorithm using nodeJs. My goal is to compare many array patterns to one source pattern then return how similar they are represented as a percent . For an example pattern1 maybe 80% similar to the source pattern .

What can be the best method for determining percent similarity for one array to another?

What I've done so far..

//source   
var soureSequence = [0.53,0.55,0.50,0.40,0.50,0.52,0.58,0.60]

//patterns to compare
var sequence1 = [0.53,0.54,0.49,0.40,0.50,0.52,0.58,0.60]
var sequence2 = [0.53,0.55,0.50,0.42,0.50,0.53,0.57,0.62]

Since I've chosen a percent based outcome , I figured I should base my source pattern off percentage change from first value to second value in array .

 var percentChange = (firstVal, secondVal) => {
        var pChange = ((parseFloat(secondVal) - firstVal) / 
         Math.abs(firstVal)) * 100.00;

        //To avoid NaN , Infinity , and Zero

        if(!pChange || pChange == 0){
            return 0.00000001
        }
        return pChange;
    }

Here I will generate my source pattern from my source sequence

       var storePattern = function(sequence){
           var pattern = [];
           for(var i = 0 ; i < sequence.length ; i++){
               let $change = percentChange(sequence[i] , sequence[i + 1]);
               if(i != sequence.length && $change ){
                    pattern.push($change)
                }
            }
    return pattern;
     }



   var sourcePattern = storePattern(soureSequence);

Now I will create more patterns to be compared

   var testPattern1 = storePattern(sequence1);
   var testPattern2 = storePattern(sequence2);

Below is my comparison function

 var processPattern = function(source , target){
    var simularityArray = [];

    for(var i = 0 ; i < target.length ; i++){
        //Compare percent change at indexof testPattern to sourcePattern of same index
        let change = Math.abs(percentChange(target[i] , source[i]));
        simularityArray.push(100.00 - change);
    }

    var rating = simularityArray.reduce((a,b) => {
        return a + b
    });

    //returns percent rating based of average of similarity pattern

    rating = rating / parseFloat(source.length + ".00");
    return rating;
}

Now I can try to estimate the similarity

var similarityOfTest1 = processPattern(sourcePattern , testPattern1)

My problem is that this only works on sequences within the same range of value .. for example 0.50 , 0.52 .. the percent change in these values would not be the same for 0.20 , 0.22 but the value difference is the same ie -> 0.02

I thought about a difference in value based pattern but at this point I'm lost.

All answers will be considered . Thanks for the help!

Sagar V
  • 12,158
  • 7
  • 41
  • 68
KpTheConstructor
  • 3,153
  • 1
  • 14
  • 22
  • so you are trying to find the differences between the two arrays a whole to produce a percentage? Or do you mean the differences between the individual array values? – Rick Jun 22 '17 at 18:13
  • IMO the rules of "how different" two arrays are depends entirely on why that difference is important, or what you are using it for. Essentially processPattern is a [fitness function](https://en.wikipedia.org/wiki/Fitness_function) and you should heed the caveats of its design accordingly. – James Jun 22 '17 at 18:17
  • @Arrow my theory is that the difference between individual array values will ultimately determine the overall percentage of how similar each pattern is to the source pattern . – KpTheConstructor Jun 24 '17 at 15:58
  • @James the difference is important as it is the only method so far of trying to find similar characteristics between each pattern ... rather it be a difference in percentage or value . – KpTheConstructor Jun 24 '17 at 16:07
  • How about cosine similarity as a similarity measure? https://en.wikipedia.org/wiki/Cosine_similarity – mbpaulus Jul 02 '17 at 23:55
  • [The similarity measurement](http://dataaspirant.com/2015/04/11/five-most-popular-similarity-measures-implementation-in-python/) may be helpful – seyyah Jul 03 '17 at 00:38
  • Of interest: [Metric](https://en.wikipedia.org/wiki/Metric_(mathematics)) – Guy Coder Aug 01 '17 at 16:55

4 Answers4

1

used reduce to get the difference than the average.

//patterns to compare
var sequence1 = [0.53,0.54,0.49,0.40,0.50,0.52,0.58,0.60]
var sequence2 = [0.53,0.55,0.50,0.42,0.50,0.53,0.57,0.62]

function diff(sequence){
var soureSequence = [0.53,0.55,0.50,0.40,0.50,0.52,0.58,0.60]
   var delta = soureSequence.reduce(function (r, a, i, aa) {
        i && r.push(a - sequence[i]);
        return r;
    }, []),
    average = delta.reduce(function (a, b) { return a + b; }) / delta.length;
    
    return {delta:delta, average:average}
}
console.log('sequence1',diff(sequence1));
console.log('sequence2',diff(sequence2));
Rick
  • 1,035
  • 10
  • 18
1

In my experience, the similarity of two vectors (arrays) is measured using the dot product ex. Like it says in that link, you multiply each corresponding elements of the arrays, add those up, then divide by the magnitude of each array (square root of the sum of the squares of each component). Rosetta Code has an example of the dot product in JavaScript, copied here

// dotProduct :: [Int] -> [Int] -> Int
const dotProduct = (xs, ys) => {
    const sum = xs => xs ? xs.reduce((a, b) => a + b, 0) : undefined;

    return xs.length === ys.length ? (
        sum(zipWith((a, b) => a * b, xs, ys))
    ) : undefined;
}

// zipWith :: (a -> b -> c) -> [a] -> [b] -> [c]
const zipWith = (f, xs, ys) => {
    const ny = ys.length;
    return (xs.length <= ny ? xs : xs.slice(0, ny))
        .map((x, i) => f(x, ys[i]));
}

So, you would call

const score1 = dotProduct(sourceSequence, sequence1);
const score2 = dotProduct(sourceSequence, sequence2);

And whichever is bigger is the closer sequence to sourceSequence.

Sam H.
  • 4,091
  • 3
  • 26
  • 34
  • 1
    I think this is close to what I'm trying to achieve . With this method lets say I have 100 test patterns for testing how do I rank these in order, what is the ranking scale ? For an example my original method would produce a percentage score . – KpTheConstructor Jun 29 '17 at 18:44
  • dot product rakes two vectors and returns a number between 0 and 1, inclusive. 1 is maximum similarity, 0 is minimum similarity. So you can multiply by 100 to get percent score if needed. Note, 1 does not mean they are exactly the same vector. If sequence2 is like sequence1, but with each element multiplied by the same constant, they will have a dot product of 1 – Sam H. Jun 29 '17 at 20:11
1

I'm not sure you need machine learning for this. You have a source pattern and you have some inputs and you basically want to perform a diff of the patterns.

Machine learning could be used to find the patterns, assuming you have some heuristic for measuring the error (if you're using unsupervised learning techniques) or you have sample sets to train the network.

But if you are simply wanting to measure the differences between one pattern and another pattern then just perform a diff operation. What you'll need to do is decide what differences your measuring and how to normalize the result.

Thomas Cook
  • 4,371
  • 2
  • 25
  • 42
1

I can't tell how exactly you would like to measure the similarity. I go by calculating the difference of corresponding items and accumulating these differences to see how much deviation it would result from the sum of the source array. You can play with the calculation the way you like.

function check([x,...xs],[y,...ys], state = {sumSource: 0, sumDiff: 0}){
  state.sumSource += x;
  state.sumDiff += Math.abs(x-y);
  return xs.length ? check(xs,ys,state) : (100 - 100 * state.sumDiff / state.sumSource).toFixed(4) + "% similarity";
}

var soureSequence = [0.53,0.55,0.50,0.40,0.50,0.52,0.58,0.60],
    sequence1     = [0.53,0.54,0.49,0.40,0.50,0.52,0.58,0.60],
    sequence2     = [0.53,0.55,0.50,0.42,0.50,0.53,0.57,0.62];

console.log(check(soureSequence,sequence1));
console.log(check(soureSequence,sequence2));
Redu
  • 25,060
  • 6
  • 56
  • 76