3

Well I am wondering normally hash function create an unique number. Are there also hash functions that can be used for approximately comparisons?

so for example

6 7 8  9 10 11 23 40 10 
5 8 10 9  9 12 24 40 20   would match

25 7 12  9 10 12 90 90    would not match

I am wondering this because I'm thinking about pattern recognition. I wonder if there is some math for which one could give a percentage of match you like to find. Using C# as a programing language.

Some clarification, first let me explain a synonym of what i like to catch. Imagine water droplets fall down but its not in a constant flow. Measurement tools are also not perfect. So now i am timing the difference between droplets faling down, this is a measurement of a series, say between 19 and 25 droplets in fact i can measure at once such a series for example if i had camera and filmed it.

Now i like to figure out having this "series" when next series starts is it different or is it the same, there might be a random gap of time between series, and the measure ment tools dont detect beginning or end of a series, they just take in between 19 or 25 measurements at once.

I'm not sure in which direction to go with this, maybe fuzzie logic, neural network patern detection, distance vectors.. there seams to lots of ways, but i wonder would be something more simple (i was thinking of something like an hash, but maybe it should be something else).

user613326
  • 2,140
  • 9
  • 34
  • 63
  • I'm sure there's fuzzy logic algorithms out there that could do this, but I don't think hashing would help you. This isn't going to be a cheap algorithm (not as cheap as hashing probably) – corsiKa Nov 02 '12 at 16:28
  • 1
    If you're strictly using integers why not use some form of a distance formula to calculate the distance between the two points and print that? A hash is used to create a globally unique finger print of the data and having comparisons of similar inputs is really against what most hashes are for. – Grambot Nov 02 '12 at 16:31
  • 4
    Hash functions are typically designed so that similar data has hashes as far apart as feasible. That's not the algorithm you're looking for. – Bobson Nov 02 '12 at 16:33
  • Possibly related: http://stackoverflow.com/q/5656293/56778 and http://stackoverflow.com/q/4834301/56778. Also, a hash function doesn't create a unique number. Multiple data streams can hash to the same value. – Jim Mischel Nov 02 '12 at 16:57
  • the question needs clarification. Are you trying to avoid storing/processing whole large arrays, or are you simply asking how to do pattern recognition? If you want to store large arrays efficiently maintaining essential characteristics then perhaps some image compression algorithm would be useful. – agentp Nov 03 '12 at 14:12

1 Answers1

0

Hash functions can be used for (not uniquely) identifying certain values. They are not guaranteed to be unique (better said, it is guaranteed that some different values will have identical hash codes). A small deviation in the value usualy results in a completely different hash code (As @Bobson already has mentioned.) Another use of hash codes is to find in-equality of two values in constant time.

It might be possible to design a hash code function that will do what you want, specialy if you know the domain your values are living in. But that will need a mathematical background to do.

As far as I know there is no hash function for the example you gave.

Here is another idea for integers, use modulo 10 operations and calculate the absolute difference betweeen each digit. This way you calculate the 'distance' between two number, not the 'difference'. I did something similar once on strings to find strings close to each other.

Some pseudo code:

int Distance(int x, int y)
{
    int result = 0;
    while ((x > 0) && (y > 0))
    {
        result += abs(x%10 - y%10);
        x /= 10;
        y /= 10;
    }
    return result;
}

void Caller()
{
    int distance = Distance(123, 456);

    if (distance == 0) write("x and y are equal");
    else write("the relative distance between x and y = " + distance.ToString())'
}
PapaAtHome
  • 575
  • 8
  • 25
  • ehm wondering why modulo 10 ?. i have been thinking also a bit modulo, but then for example : 68 72 may not match while 72 76 would match but i was wondering if it could be used for patern search, hm like series of numbers. I am do some sensor readings and wonder if i can detect patterns in it. – user613326 Nov 02 '12 at 17:55
  • 1
    @user613326 - You might want to go ask on http://math.stackexchange.com/ to come up with a good algorithm for pattern searching in integer sets. – Bobson Nov 02 '12 at 19:06
  • @user613326 Any base will do. 10 comes quite natualy. Another thing to do is to weight the difference between digits by the distance from the least significant ditig. If weighting with a factor that is equal to the base then the distance become equal to the difference (and so bypassing the goal of this algorithm. ;-) – PapaAtHome Nov 02 '12 at 22:53