0

I need a way to convert a strings collection into a unique string. This means that I need to have a different string if any of the strings inside the collection has changed.

I'm working on a big solution so I may wont be able to work with some better ideas. The required unique string will be used to compare the 2 collections, so different strings means different collections. I cannot compare the strings inside one by one because the order may change plus the solution is already built to return result based on 2 strings comparison. This is an add-on. The generated string will be passed as parameter for this comparison.

Thank you!

Moslem Ben Dhaou
  • 6,897
  • 8
  • 62
  • 93

5 Answers5

1

What about using a hash function?

BenH
  • 2,100
  • 2
  • 22
  • 33
  • 1
    @MoslemBenDhaou A cryptographic hash function will almost certainly return unique strings. If you find two strings that hash to the same thing, it would be big news. – BenH Dec 19 '11 at 16:01
  • "Ea" and "FB", it simply depends of the prime number used to hash the strings. with 32bit sdk, its often the prime number 31. it is simply the difference between "a" and "B". – Moslem Ben Dhaou Dec 19 '11 at 16:07
  • @MoslemBenDhaou Use a cryptographic hash. – BenH Dec 19 '11 at 16:09
  • @MoslemBenDhaou From wikipedia (http://en.wikipedia.org/wiki/Cryptographic_hash_function), "it is infeasible to find two different messages with the same hash". – BenH Dec 19 '11 at 16:11
1

Considering you constraints, use a delimited approach:

pick a delimiter and an escape method. e.g. use ; and escape it bwithin strings y \;, also escape \ by \\

So this list of strings...

"A;bc"
"D\ef;"

...becomes "A\;bc;D\\ef\;"

It ain't pretty, but considering that it has to be a string, then the good old ways of csv and its brethren isn't all too bad.

Robert Giesecke
  • 4,314
  • 21
  • 22
1

These both work by deciding to use the separator character of ":" and also using an escape character to make it clear when we mean something else by the separator character. We therefore just need to escape all our strings before concatenating them with our separator in between. This gives us unique strings for every collection. All we need to do if we want to make collections the same regardless or order is to sort our collection before we do anything. I should add that my sample uses LINQ and thus assumes the collection implements IEnumerable<string> and that you have a using declaration for System.LINQ

You can wrap that up in a function as follows

string GetUniqueString(IEnumerable<string> Collection, bool OrderMatters = true, string Escape = "/", string Separator = ":")
{
    if(Escape == Separator)
        throw new Exception("Escape character should never equal separator character because it fails in the case of empty strings");
    if(!OrderMatters) 
        Collection = Collection.OrderBy(v=>v);//Sorting fixes ordering issues.
    return Collection
        .Select(v=>v.Replace(Escape, Escape + Escape).Replace(Separator,Escape + Separator))//Escape String
        .Aggregate((a,b)=>a+Separator+b);
}
ForbesLindesay
  • 10,482
  • 3
  • 47
  • 74
0

By a "collection string" you mean "collection of strings"?

Here's a naive (but working) approach: sort the collection (to eliminate dependency on order), concat them, and take a hash of that (MD5 for instance).

Trivial to implement, but not very clever performance-wise.

Sergio Tulentsev
  • 226,338
  • 43
  • 373
  • 367
  • MD5 (for example) is a 128-bit number. That's a whole damn lot of different values. Other hashes are even longer. I wouldn't take collisions too seriously. – Sergio Tulentsev Dec 19 '11 at 16:06
  • The actual problem with this solution (as with many solutions proffered) is the corner case of comparing {"AB", "C"} with {"A","BC"}. The hashing part really is fine (but unnecessary) – ForbesLindesay Dec 19 '11 at 16:35
  • You can put a separator of your choice between them. I used hashing as a mean to limit size of this "token", but if it's not a problem for you, then ok, don't do it. – Sergio Tulentsev Dec 19 '11 at 16:39
0

Are you saying that you need to encode a string collection as a string. So for example the collection {"abc", "def"} may be encoded as "sDFSDFSDFSD" but {"a", "b"} might be encoded as "SDFeg". If so and you don't care about unique keys then you could use something like SHA or MD5.

Sachin Kainth
  • 45,256
  • 81
  • 201
  • 304
  • yes this is what I'm saying but I need the strings generated from encoding the 2 collections to be always unique. That's why I can't use hash functions. – Moslem Ben Dhaou Dec 19 '11 at 15:57
  • @Moslem Most hash functions can be considered to be unique unless the sample size is huge, and I mean absolutely vast, but if you don't care about the size of the result then you can just concatenate them. – ForbesLindesay Dec 19 '11 at 16:04