0

We need to find intersection of several integer sorted arrays. Here is example:

Example:

Input:
1,3,7,8
2,3,8,10
3,10,11,12,13,14

minSupport = 1

Output:

1 and 2: 2, 8
1 and 3: 3
2 and 3: 3, 10

I wrote algorithm and it works fast.

    var minSupport = 2;
    var elementsCount = 10000;
    var random = new Random(123);

    // Numbers of each array are unique
    var sortedArrays = Enumerable.Range(0,elementsCount)
    .Select(x => Enumerable.Range(0,30).Select(t => random.Next(1000)).Distinct()
    .ToList()).ToList();
    var result = new List<int[]>();
    var resultIntersection = new List<List<int>>();


    foreach (var array in sortedArrays)
    {
        array.Sort();
    }



    var sw = Stopwatch.StartNew();

    //****MAIN PART*****//

    // This number(max value which array can contains) is known. 
    // Ofcourse we can use dictionary if donnt know maxValue
    var maxValue = 1000;

    var reverseIndexDict = new List<int>[maxValue];

    for (int i = 0; i < maxValue; i++)
    {
        reverseIndexDict[i] = new List<int>();
    }

    for (int i = 0; i < sortedArrays.Count; i++)
    {
        for (int j = 0; j < sortedArrays[i].Count; j++)
        {
            reverseIndexDict[sortedArrays[i][j]].Add(i);
        }
    }


    var resultMatrix = new List<int>[sortedArrays.Count,sortedArrays.Count];

    for (int i = 0; i < sortedArrays.Count; i++)
    {   
        for (int j = 0; j < sortedArrays[i].Count; j++)
        {
            var sortedArraysij = sortedArrays[i][j];

            for (int k = 0; k < reverseIndexDict[sortedArraysij].Count; k++)
            {
                if(resultMatrix[i,reverseIndexDict[sortedArraysij][k]]==null) resultMatrix[i,reverseIndexDict[sortedArraysij][k]] = new List<int>();

                resultMatrix[i,reverseIndexDict[sortedArraysij][k]].Add(sortedArraysij);    

            }
        }
    }


    //*****************//

    sw.Stop();

    Console.WriteLine(sw.Elapsed);

But my code is fall down with outofmemoryException when elements count is more then about 10000. How can i improve my algorithm or what i can to do to resolve this issue?

Neir0
  • 12,849
  • 28
  • 83
  • 139
  • possible duplicate of [Find intersection group of sorted integer arrays](http://stackoverflow.com/questions/10889479/find-intersection-group-of-sorted-integer-arrays) – Mitch Wheat Jun 07 '12 at 02:45
  • @Mitch Wheat Nope. This question about resolve concrete issue in algorithm. Linked question is about to found fast algorithm. – Neir0 Jun 07 '12 at 02:46
  • looks like the same question to me... – Mitch Wheat Jun 07 '12 at 02:47
  • I will post this as a comment since I wont be providing an implementation. But if both of your collections are already sorted, then rather than create a giant dictionary and compare everything at once, you should break both collections up into smaller collections of certain numeric ranges. Using ranges will also allow you to quickly eliminate a large amount of results. I guess this might be quite similar to creating a binary search tree. – row1 Jun 07 '12 at 02:57
  • @row1 Hmm i am trying to catch your idea but can you provide a few lines of pseudocode? – Neir0 Jun 07 '12 at 03:05
  • Given two collections A & B, each with 100 items. Grab the first 10 items from both collections and put them into arrays A1 and B1. You can quickly compare A1[9] with B1[0] and see if they are in the same range, if not then you can ignore A1 and get the next 10 for A and compare against B1. When they are in the same range you can use your current algorithm over these two smaller collections. There is still likely to be a much better solution, but dividing and conquering will at least reduce your memory usage. – row1 Jun 07 '12 at 03:15
  • @row1 I have a few thousands arrays and they are short(about 30 elements). I need to apply your algorithm for each pair. So...if i have 10 000 arrays it takes about 10 000 000 such operations. – Neir0 Jun 07 '12 at 03:26

2 Answers2

0

Use the Distinct method like this:

...
var theDistinctListOfInts = new List<int>();
foreach(var listOfInts in theListsOfInts)
{
    theDistinctListOfInts = theDistinctListOfInts.Intersect(listOfInts);
}
...
bluevector
  • 3,485
  • 1
  • 15
  • 18
0

In the case you know the max integer number the arrays can have you could do the following:

var histoMatrix = new int[1000]; // the max number in arrays is 1000 here

for (int i = 0; i < sortedArrays.Count; i++)
{   
    for (int j = 0; j < sortedArrays[i].Count; j++)
    {
        var sortedArraysij = sortedArrays[i][j];

        histoMatrix[sortedArraysij]++;
    }
}

var resultMatrix = new List<int>();

for (int i = 0; i < 1000; i++)
{
    if (histoMatrix[i] == sortedArrays.Count)
        resultMatrix.Add(histoMatrix[i]);
}

In this case you don't even need the arrays to be sorted.

Hope it helps

Vladimir
  • 408
  • 2
  • 7