0

I have 8 pairs of data sets.
Each pair is comprised of a 1-dimensional array with typically 400 string elements and a column with typically 2000 string elements.
Within each pair of data sets, I want to check each element of the array against each element of the column for a match.

The simplified version of my approach:

For i = 1 to 8
    For j = 0 to 400
        For k = 0 to 2000
           If Cells(k,i) = myArray[1, then 2, then 3, etc.](j) then [action]
        next K
    next j
next i

I'm looping through column A's first 2000 cells 400 times, then column B's first 2000 cells 400 times, and so on. That seems redundant and it's around 6.4 million cells checked.

The actual code. It's looking for dupes that also have a qualifying boolean. When these conditions are met, then it's taking their associated integer value and row number.
It sums all the integer values and replaces those integers with their sum.
It does this for every unique name, and then (not shown), repeats this for the remaining 7 data sets.

For i = 0 To j - 1 'starts with the first unique element and searches for dupes...
    v = 0 'initial qty value 
    r = 0 'initial number of rows with dupes
    For k = 2 To WULastRow 'go though each cell in the first column
        If Cells(k, colPair + 2) = True And Cells(k, colPair).Text = uniqueNames(i) Then '...if one is found and the row's boolean is true...
            v = v + Cells(k, colPair + 1).Value '...sum it's qty with previous dupes qty's...
            r = r + 1 'increment number of dupes found
            ReDim Preserve rngMatch(r - 1) 'increase the size of the array that will hold the row numbers of the dupes.
            rngMatch(r - 1) = k '...and input dupe's row number to said array.
        End If
    Next k

    k = 0
    
    'if 1 or more duplicate is found for the given unique item name is found...
    If r > 1 Then
        k = 0
        For k = 0 To r - 1
            Cells(rngMatch(k), colPair + 1).Value = v '...input the summed quantity in their qty cells...
        Next k
    End If
Next i 'then regardless, move on to the name unique name and repeat this process.
Community
  • 1
  • 1
AnthonyJS
  • 153
  • 1
  • 3
  • 10
  • what's the [action]? – Gene Skuratovsky Jan 30 '15 at 21:11
  • the [action] is a bit involved. To go a little bit past the simple code I listed, Along side each column A of 2000 names is a column B of 2000 integers and a column C of 2000 booleans. If a given row where a duplicate is found in column A and the boolean in that row is true, [action] is to take the integer in column B and sum it with all the other integers for this unique array item on other rows where a dupe was found with a "true" in column C, then replace each integer involved in that addition by the resulting sum. Does that help? – AnthonyJS Jan 30 '15 at 21:25
  • your _"then replace each integer involved"_ is what consules time, not the size of the loop. – Gene Skuratovsky Jan 30 '15 at 21:29
  • Show the rest of the code, there are probably a couple of other things that are taking time besides the loop. – D_Zab Jan 30 '15 at 21:34
  • @Gene, I don't know, I rarely find any duplicates, so I rarely replace any integers. Does that mean I'm doing unnecessary things in my code (posted above now)? – AnthonyJS Jan 30 '15 at 21:40
  • The desctiption is not quite clear. It would be best if you could show an example worksheet image; do reduce the problem to a dozen rows and half-a-dozen columns, and a small MyAarray. and explain the process for this example. Could you do that? As to your question... Yes, I supect you do. – Gene Skuratovsky Jan 30 '15 at 21:41
  • @Gene Skuratovsky I sadly do not have enough reputation to post pictures. Imagine repeating "X" then "Y" then "Z", one letter per cell, starting in A1, going down to A12. Then each cell in B1:B12 has a "3", C1:C6 has "true" and C7:C12 has "False". After this script, everything would be the same, except B1:B6 would have "6". For each letter, x, y and z, there would be 4 duplicates, 2 of each duplicate per letter would be ignored because of the "False". Each duplicate's b value would get summed and replaced. – AnthonyJS Jan 30 '15 at 21:54
  • Aside from optimizing the algorithm itself, have you tried turning off automatic calculation while this is running? Also, I don't see any calls to .Activate or .Select so turning off screen updates might not have a big impact. One other note that isn't clear from your code - are your variables declared as specific types or are they variants? –  Jan 30 '15 at 22:14
  • @SilentD I have not tried turning off auto-calc. I have screen updating off, but you're right I don't use much select/activate. Yes I left out a lot, the variables are all Dim'ed as specific types (all variables here are integers, except for the uniqueNames() array, which is string). Thanks for the tip! – AnthonyJS Jan 30 '15 at 22:24

2 Answers2

2

There is a large overhead every time you access a Cell(i,j)

You need to avoid this by using a single statement to get the data into a variant arry, then loop the array, then if needed put it back: here is some demo code

Option Base 1
dim vArr as variant
varr=Range("A1").resize(Lastrow,LastCol).Value2
for j=1 to LastRow
for k=1 to LastCol
if vArr(j,k) .... then Varr(j,k)= ...
next k
next j
Range("A1").resize(Lastrow,LastCol)=Varr
Charles Williams
  • 23,121
  • 5
  • 38
  • 38
  • Oh, very interesting, so a 2 dimensional array? I will try this (on Monday at work, now I'm going home haha). Thanks a lot! – AnthonyJS Jan 30 '15 at 22:57
1

What you need to do is implement some sort of variable ordering heuristic to order your elements in matrices (lowest to highest, etc.). Here's an example to get you started:

If I have one array a = [1,8,16] (ordered least to most) and another array b = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16] (again ordered) I can introduce a rule that each search iteration happens at the ending position of the last search. Here's some pseudo for what I mean:

index = 0
for i in a:
   for j in range(index, len(b)):
      if a == b:
          do something
          index = j

What this effectively does is allow you to search an iteratively smaller and smaller space each time you match a value. Hopefully this makes sense - applying a variable-ordering ranking to your b matrices may be tough though.

Kyle Coventry
  • 555
  • 3
  • 9
  • Indeed, I believe you can use something called the quick sort algorithm. The original poster also probably needs some handy computer science algorithms to make his or her problem resolve faster. – Bobort Jan 30 '15 at 22:07
  • @Kyle Booth, Yes this is what I'm looking for. Whenever a duplicate is found, by definition it will not be a duplicate for the next uniqueName() element, so it should be automatically ignored on all following iterations. I think the difficulty will be in ordering the arrays (which is where all the benefits come from it seems). If I sort a dataset, it will mangle the rest of them. This is definitely a good starting point, and I'll check into that "quick sort algorithm" that bobort mentioned. Thanks guys! – AnthonyJS Jan 30 '15 at 22:31
  • @AnthonyJS while not wrong, this A ignores the elephant in the room: accessing `Cells` object inside the loop is orders of magnitude slower than accessing an array. Charles answer broadly addresses this, and there are many A's on SO that demonstrate the technique ( [one example](http://stackoverflow.com/a/13423993/445425) ). Another opportunity is that `Redim Preserve` is time expensive. Do it outside the loop if you can, or in large chunks inside the loop if you must ( [see this example](http://stackoverflow.com/a/7693602/445425) ). Only after applying these tips would I apply Kyles advise. – chris neilsen Jan 30 '15 at 22:55
  • @chrisneilsen thanks a lot! This is exactly what I was hoping to hear, you seem very knowledgeable about what takes time and how to avoid those things. I will check out the examples you gave me and see what I can come up with. – AnthonyJS Feb 03 '15 at 14:54