0

I'm trying to make MATCH function work like FIND function. First of all, I generate the dummy data to be use for testing. Here is the routine I use:

Sub Data_Generator()
Randomize
Dim Data(1 To 100000, 1 To 1)
Dim p As Single
For i = 1 To 100000
    p = Rnd()
    If p < 0.4 Then
        Data(i, 1) = "A"
    ElseIf p >= 0.4 And p <= 0.7 Then
        Data(i, 1) = "B"
    Else
        Data(i, 1) = "C"
    End If
Next i
Range("A1:A100000") = Data
End Sub

Now, I create a sub-routine to find the string A in the range Data. There are two methods I use here that employ MATCH function. The first method is to reset the range of lookup array like the following code:

Sub Find_Match_1()
T0 = Timer
Dim i As Long, j As Long, k As Long, Data As Range
Dim Output(1 To 100000, 1 To 1)
On Error GoTo Finish
Do
    Set Data = Range(Cells(j + 1, 1), "A100000")   'Reset the range of lookup array
    i = WorksheetFunction.Match("A", Data, 0)
    j = j + i
    Output(j, 1) = j                               'Label the position of A
    k = k + 1                                      'Counting the number of [A] found
Loop
Finish:
    Range("B1:B100000") = Output
    InputBox "The number of [A] found are " & k & " in", "Process is complete", Timer - T0
End Sub

And for the second method, I assign the cell of range where A is located by value vbNullString instead of resetting Range("A1:A100000"). The idea is to delete the string A after being found and to expect MATCH function to find the next string A in the Range("A1:A100000"). Here is the code to implement the second method:

Sub Find_Match_2()
T0 = Timer
Dim n As Long, i As Long, j As Long
Dim Data_Store()
Dim Output(1 To 100000, 1 To 1)
Data_Store = Range("A1:A100000")
On Error GoTo Finish
Do
    j = WorksheetFunction.Match("A", Range("A1:A100000"), 0)
    Output(j, 1) = j
    Cells(j, 1) = vbNullString
    n = n + 1
Loop
Finish:
Range("A1:A100000") = Data_Store
Range("B1:B100000") = Output
InputBox "The number of [A] found  are " & n & " in", "Process is complete", Timer - T0
End Sub

The goal is to determine which method is better at employing MATCH function in its performance. It turns out the first method only completes less than 0.4 seconds meanwhile the second method completes about a minute on my PC. So my questions are:

  1. Why does the second method take time too long to complete?
  2. How does one improve the performance of the second method?
  3. Can MATCH function be used in an array?
  • 4
    I'm voting to close this question as off-topic because it's better suited for [Code Review](http://codereview.stackexchange.com) – Scott Holtzman Jul 25 '16 at 15:14
  • The individual writes to the sheet to clear each cell with A in the second method are what's adding the overhead, and yes you can use Match on an array. There are a bunch of different approaches you can take to find things on a worksheet: which is best depends on several factors, such as the "cardinality" of the data in the area being searched, how large the area is, and how many times you'll be calling the "find" method. This thread has some useful info and comments: https://stackoverflow.com/questions/18754096/matching-values-in-string-array/18769246#18769246 – Tim Williams Jul 25 '16 at 16:07
  • @TimWilliams I'm well aware of that answer and I've upvoted. I even made a [comment referred to that answer below your answer to my post](https://stackoverflow.com/questions/38128055/vlookup-like-function-select-case-for-a-long-list-in-vba-excel?noredirect=1&lq=1#comment63703141_38128449) :) Could you give me references to use MATCH in an array? I couldn't find it – Anastasiya-Romanova 秀 Jul 26 '16 at 01:59
  • `Dim m: m = Application.Match(valueHere, arrayHere, 0)` You can then test m using IsError(m) - will be True if no match was found, otherwise m will be the location of the first match. – Tim Williams Jul 26 '16 at 02:07
  • @TimWilliams Sub Quick_Find_2() Dim i As Long, j As Long, k As Long, n As Long Dim Data() Dim Output(1 To 100000, 1 To 1) Data = Range("A1:A100000") Do j = Application.Match("A", Data, 0) If IsError(j) = True Then Exit Do Else k = k + j ReDim Data(1 To 10000 - k, 1 To 1) Data = Range(Cells(k + 1, 1), "A100000") Output(k, 1) = k n = n + 1 End If Loop Range("B1:B100000") = Output InputBox "The number of [A] found are " & n, "Process is complete" End Sub but the run-time error 13 kept occurred. – Anastasiya-Romanova 秀 Jul 26 '16 at 02:35
  • Dim j As Variant, not Long – Tim Williams Jul 26 '16 at 02:57
  • Still error pointed at `If IsError(j) = True Then`. I changed to `If IsError(j) Then` doesn't work either – Anastasiya-Romanova 秀 Jul 26 '16 at 03:01
  • Works for me. You will find however that running Match repeatedly on a large array is very slow: a simple loop through your sample data completes for me in about 0.1 sec. As Charles Williams points out in the comments of the earlier discussion, Match is fastest when the data is on a worksheet: it's *much* slower against an array. – Tim Williams Jul 26 '16 at 06:42
  • @TimWilliams Now it works but the code couldn't find any A. It kept popping out "the number of A found is 0". I did make a routine using looping through an array like in [the comment below answer](http://stackoverflow.com/questions/38571256/on-making-match-function-like-find-function?noredirect=1#comment64549021_38572853). I just try to make the MATCH function looks like FIND function, that's all. – Anastasiya-Romanova 秀 Jul 26 '16 at 07:01

1 Answers1

1

I agree that this is more of a Code Review question, but I chose to look into it for my own curiosity, so I'll share what I found.

I think you're hitting a very classic case of N vs N^2 computational complexity. Look at your two methods, which seem remarkably similar, and consider what they're actually doing, keeping in mind that the MATCH function is probably just a linear search when you use Match_type=0 (because your data is unsorted, whereas other match types could do a binary search on your sorted data).

Method 1:

  • Start at A1
  • Continue down the range until an "A" is found
  • Restart at the cell below the MATCH

Method 2:

  • Start at A1
  • Continue down the range until an "A" is found
  • Clear the "A"
  • Restart at A1

It should be instantly apparent that while one method is continually shrinking the range it searches, the other is always starting at the first cell and searching the whole range. This will account for some of the speedup, and already boosts Method 1 to a nice lead, but it's not even nearly the full story.

The real key lies in the amount of work Match has to do for each situation. Because its range constantly shrinks and moves its start further down the list, whichever cell Method 1's Match starts from, it only has to search a small number of cells before it hits an A and resumes the outer loop. Meanwhile, Method 2 is continually destroying A's, making them less and less dense and forcing itself to search more and more of the range before getting any hits. By the end, Method 2 is looping through almost 100,000 empty cells/B's/C's before finding its next A.

So on average, the Match for Method 1 is only looking through a couple of cells each time, while the Match for Method 2 is taking longer and longer as time goes on, until the end when it is forced to loop through the entire range. On top of that, Method 2 is doing a bunch of writes to cell values, which is slower than you might think when you have to do it tens of thousands of times.

In all honesty, your best bet would be to just loop through the cells yourself once, looking for A's and handling them as you go. MATCH brings no advantage to the table, and Method 1 is basically just a more complicated version of the loop I described.

I'd write this something like:

Sub Find_Match_3()
    T0 = Timer
    Dim k As Long, r As Range
    Dim Output(1 To 100000, 1 To 1)
    For Each r In Range("A1:A100000").Cells
        If r.Value = "A" Then
            Output(r.Row, 1) = r.Row        'Label the position of A
            k = k + 1                       'Counting the number of [A] found
        End If
    Next

    Range("B1:B100000") = Output
    InputBox "The number of [A] found are " & k & " in", "Process is complete", Timer - T0
End Sub

Which is about 30% faster on my machine.

Mikegrann
  • 1,081
  • 7
  • 17
  • Thanks for your answer. I did have knowledge to the code that you posted. I even have a better performance than that that I used for answering so questions here, like this for example: 'Sub Quick_Find() T0 = Timer Dim n As Long Dim Data() Dim Output(1 To 100000, 1 To 1) Data = Range("A1:A100000") For i = 1 To 100000 If Data(i, 1) = "A" Then Output(i, 1) = i n = n + 1 End If Next Range("B1:B100000") = Output InputBox "The number of [A] found are " & k & " in", "Process is complete", Timer - T0 End Sub' – Anastasiya-Romanova 秀 Jul 26 '16 at 02:04
  • I asked this question because I'm interested in MATCH function and tried to make it better if possible. Thanks for your answer. I appreciate it. (+1) – Anastasiya-Romanova 秀 Jul 26 '16 at 02:05