-2

I have two dictionary objects

I am trying to intersect DictA from DictB by their values and return a 3rd dictionary with the results

I am able to do this however this only produces a list of ints

var results = DictA.Values.Intersect(DictB.Values);

This method is wayyy to slow

var results = DictA.Where(x => DictB.ContainsValue(x.Value)).ToDictionary(x => x.Key, x => x.Value);

Performance is key. Each dictionary holds several million records.

How can I achieve intersecting 2 dictionaries, yielding a 3rd dictionary?

mrb398
  • 1,277
  • 4
  • 24
  • 32
  • 3
    'Each dictionary holds several million records'. Right, and you index them by key, and you expect it to be fast finding a specific value? – Patrick Hofman Oct 22 '14 at 14:24
  • How are the Dictionaries defined? If the values are the same would the keys also be the same? If so, then do your intersect on the keys, then you can rebuild a Dictionary using the key values to find the corresponding Values from either of the sources. – Tian van Heerden Oct 22 '14 at 14:25
  • 1
    The best option would be to swap the key and value in the dictionary. – Patrick Hofman Oct 22 '14 at 14:26
  • What if you have a dictionary which is broadly `{{a:1},{b,2}}` and the other is `{{a:2},{b,1}}`. Values 1 and 2 are in both dictionaries but I'm not sure what your final dictionary would look like... Perhaps some examples of what you want using some two or three entry dictionaries with some of the obvious edge cases might help us understand your requirements better. – Chris Oct 22 '14 at 14:27
  • @Chris His way-too-slow method would take the keys from the first dictionary, so that might be his requirement. (And at first glance that's what Selman22's answer would do too.) – Rup Oct 22 '14 at 14:30
  • Perhaps a dictionary isn't the best option. Basically I have an array of RecordIds and HashValues. I just need to figure out the fastest way to intersect lists of these values. – mrb398 Oct 22 '14 at 14:40
  • 1
    @Rup No, that's not what selman's answer would do. He left it 90% unimplemented, and as such his answer could do any number of possible things, but this is in fact not one of them, as his comparer would have no idea whether a given pair was from the first or second dictionary, which it would need to know to resolve conflicts in the same manor. – Servy Oct 22 '14 at 14:47
  • @Servy OK, but it's fairly obvious his equality comparer just compares .Values and returns .Value.GetHashCode(), isn't it? It's not up to the comparer to resolve duplicates, it's up to Dictionary.Intersect, and [its documentation](http://msdn.microsoft.com/en-us/library/vstudio/bb355408.aspx) says 'the set that contains all the elements of A that also appear in B', i.e. it says it takes the elements from the first input. – Rup Oct 22 '14 at 15:01
  • @Rup He neither showed nor described anything about his implementation. We know nothing about what he thinks it should be. We don't know what the OP intends to happen with duplicate values; based on the requirements we need to determine what should happen for duplicate values with different keys, and that will determine what the equality comparer should return in those cases. What the equality comparer returns will then affect what `Intersect` returns. Currently he hasn't even described what should be done in the case of duplicate values with differing keys. – Servy Oct 22 '14 at 15:04
  • 1
    @user1691808: are the hash values the hashes of the RecordIds or are they effectively two independant pieces of data? ie if the hash values are the same does that imply the RecordId is the same and vice versa? – Chris Oct 22 '14 at 15:05
  • @Servy I agree it'd be better for the OP to spell that out, but if we infer that his "wayyy to slow" method gives the results he wants then that is all specified. – Rup Oct 22 '14 at 15:13
  • @Rup And it's impossible for selman's code to ever replicate those results, ever, as written. – Servy Oct 22 '14 at 15:14

1 Answers1

0

I think using the dictionary is going to cause you performance issues if you search on values; the keys portion of the dictionary is optimized to give you O(1) performance; the values collection is basically an ICollection so searching that on O(n).

If you move the values to be in a HashSet then your performance will improve dramatically; as a rough example:

        var dict1 = new Dictionary<string, string>();
        var dict2 = new Dictionary<string, string>();

        for (var x = 0; x < 1000000; x++)
        {
            dict1.Add(x.ToString(), x.ToString());
        }

        for (var x = 0; x < 2000000; x+=2)
        {
            dict2.Add(x.ToString(), x.ToString());
        }

        var hs1 = new HashSet<string>(dict1.Values);
        var hs2 = new HashSet<string>(dict2.Values);

        hs1.IntersectWith(hs2);
Allan Elder
  • 4,052
  • 17
  • 19
  • @Servy His current code meets his requirements in terms of functionality, but is too slow; as I said, he is searching on values which is causing his performance issues. My point is to use a HashSet instead of doing that, and I gave him rough code on how he could do that in his current code to try it out. – Allan Elder Oct 22 '14 at 14:49
  • He provided two solutions. One doesn't produce the right results, one is too slow. Yours produces identical results as the one that doesn't produce the right results. – Servy Oct 22 '14 at 14:53