I'm looking for a way to get a hash value from a group of strings, such that no matter which order the strings, the same hash returns.
One way I guess would be to sort them before hashing. But I wonder if there's something more elegant.
I'm looking for a way to get a hash value from a group of strings, such that no matter which order the strings, the same hash returns.
One way I guess would be to sort them before hashing. But I wonder if there's something more elegant.
let's say your strings includes only lower case English letters (a-z), such that
Here you could do simple character histogram for each column. characters will be the indexes and array value will be the number of that character. like
vector <int> _1st_column_hist (26,0);
_1st_col_hist['l' - 'a'] => 1
_1st_col_hist['i' - 'a'] => 1
_1st_col_hist['s' - 'a'] => 1
_1st_col_hist['a' - 'a'] => 1
//other values will be 0.
Do the same things for other columns (or letter indexes). Finally you will have 2D vector. To clarify:
vector< vector<int> > my_vector( 0, vector<int> (26, 0));
my_vector[1][25] => is the number of 'z' in second position of all words in your group.
OK! where is the hash table? I must say that your question is more related to histogram than hashing.
This 2D vector is for one group of strings. I assume you have multiple groups, so your vector needs another dimension. To check if our lookup table already has new group of strings, you need to write group_of_strings_2_hist and compare_hist_with_old_ones functions and compare it each element of your lookup table.