0

I'm looking for a way to get a hash value from a group of strings, such that no matter which order the strings, the same hash returns.

One way I guess would be to sort them before hashing. But I wonder if there's something more elegant.

gerbil
  • 859
  • 7
  • 26
  • You could hash each string, then sort the hashes, then hash the hashes. – Jonathon Reinhart Jul 03 '17 at 19:49
  • 1
    Another possibility would be to simple combine the hash values by some commutative operator (i.e. addition, multiplication, XOR, ...), which is probably faster than sorting the elements first. If I don't encounter too many collisions, that's usually my go-to solution for such cases. – Tobias Ribizel Jul 03 '17 at 20:01

1 Answers1

0

let's say your strings includes only lower case English letters (a-z), such that

  • lorem
  • ipsum
  • sit
  • amet

Here you could do simple character histogram for each column. characters will be the indexes and array value will be the number of that character. like

vector <int> _1st_column_hist (26,0);
_1st_col_hist['l' - 'a'] => 1
_1st_col_hist['i' - 'a'] => 1
_1st_col_hist['s' - 'a'] => 1
_1st_col_hist['a' - 'a'] => 1

//other values will be 0.

Do the same things for other columns (or letter indexes). Finally you will have 2D vector. To clarify:

vector< vector<int> > my_vector( 0, vector<int> (26, 0));
my_vector[1][25] => is the number of 'z' in second position of all words in your group.

OK! where is the hash table? I must say that your question is more related to histogram than hashing.

This 2D vector is for one group of strings. I assume you have multiple groups, so your vector needs another dimension. To check if our lookup table already has new group of strings, you need to write group_of_strings_2_hist and compare_hist_with_old_ones functions and compare it each element of your lookup table.

Dharman
  • 30,962
  • 25
  • 85
  • 135
rido
  • 51
  • 6