1

I have a problem with using hash function. I have to assign some number(128 bit or 64 bit) with every word in the document. So, the hash value of "similarity" must be near with "similar". That means, if has value of similarity=>10022(say) then similar=>10025. which should near with similar word. also the hash value of different name should similar. that means, hash value of "john" also should be near about with " michel" or "sita"... so on. If any body have any idea about it.

Thanks in advanced. :)

Joachim Sauer
  • 302,674
  • 57
  • 556
  • 614
MrYo
  • 1,797
  • 3
  • 19
  • 33

2 Answers2

3

it's not working in that way , first you have to find the general model for the sample value of available data, and then use it for the streaming log messages.

Ramesh Karna
  • 816
  • 1
  • 7
  • 24
0

there is a library called OpenNLP, so by using this library you can know what type of word is it. then as you said that for the similar word like names, there can be write hash function in which name or verbs and so one can get the similar hash value. thanks.

  • 1
    Open library works fine for general English, but event logs are not follows general English patterns. I can't use it in log messages. If you have any ideas? – MrYo May 17 '12 at 19:17