3

I have several quite large data tables in a single workbook and need to make an Excel user-defined function to lookup data in those tables. I need to be able to provide a variable number of key columns and key values to search for and the function needs to find the first matching row, then use this offset to lookup a value in corresponding data columns.

Kind of like a multi-key version of VLOOKUP. I know you can create a single key, made up of the individual keys appended together in column 1 but I want to avoid having to do this.

I have created several versions of this function and the best one so far works fine except it is slow! One of my tables is nearly 9,000 rows by 11 columns and I'm using a 6-field key. I have about 18,000 occurrences of this formula and recalcing the worksheet takes about 3 minutes (I worked out there are 970 million calcs involved so of course it would be slow).

It uses Evaluate() as many solutions that I have seen also use. Here's the code:

Function KeyLookup(datatable As Variant, datacol As String, _
                   key1table As Variant, key1 As String, _
                   Optional key2table As Variant, Optional key2 As String, _
                   Optional key3table As Variant, Optional key3 As String, _
                   Optional key4table As Variant, Optional key4 As String, _
                   Optional key5table As Variant, Optional key5 As String, _
                   Optional key6table As Variant, Optional key6 As String) As Variant
    Dim cmd As String

    cmd = "INDEX(" & datatable.Address & ",MATCH(1,("
    cmd = cmd & key1table.Address & "=""" & key1 & """)"
    If Not IsMissing(key2table) Then cmd = cmd & "*(" & key2table.Address & "=""" & key2 & """)"
    If Not IsMissing(key3table) Then cmd = cmd & "*(" & key3table.Address & "=""" & key3 & """)"
    If Not IsMissing(key4table) Then cmd = cmd & "*(" & key4table.Address & "=""" & key4 & """)"
    If Not IsMissing(key5table) Then cmd = cmd & "*(" & key5table.Address & "=""" & key5 & """)"
    If Not IsMissing(key6table) Then cmd = cmd & "*(" & key6table.Address & "=""" & key6 & """)"
    cmd = cmd & ",0)," & datacol & ")"

    KeyLookup = Evaluate(cmd)
End Function

This generates cmd values that look like this:

INDEX($K$3:$L$8993,MATCH(1,($B$3:$B$8993="a1-5")*($C$3:$C$8993="Tarp")*($E$3:$E$8993="Sydney")*($F$3:$F$8993="Highest Reach")*($G$3:$G$8993="1+")*($J$3:$J$8993="T0"),1),1)

I need some help to make this as fast as possible. 3 minutes is much too slow.

As mentioned, I want to avoid VLOOKUP() based solutions as I don't want to have to pre-calculate the combined keys.

I also want to avoid SUMPRODUCT solutions which only work with numbers and don't return the first value but sum all values if multiple matches are found.

I also can't rely on third-party add-ins, even though I know some good ones exist.

So, my current thought is to use WorksheetFunction.Index()/Match() natively, therefore removing Evaluate() because I know this adds a significant overhead.

However, I have been unsuccessful in my attempts to remove Evaluate(). Can anyone assist me here?

It seems WorksheetFunction.Index()/Match() in VBA only supports a single range and single key unless someone can explain how to implement that lovely (range1=key1)*(range2=key2)... notation that the MATCH function in worksheets is blessed with, but WorksheetFunction.Match() somehow isn't.

GreggR
  • 41
  • 3
  • What's the question?? – Jean-François Corbett Nov 13 '13 at 08:14
  • Sorry, the question was a bit buried...it's here: "To speed this up, I want to use INDEX/MATCH natively, without Evaluate() because I know this adds a significant overhead.". So... a) I want to remove the Evaluate() function, but I can't translate the (range1=key1)*(range2=key2)... notation that Evaluate() accepts into pure VBA? Is this possible? b) Any other tips for speeding this up? – GreggR Nov 13 '13 at 09:19
  • I suggest you edit your posted question to make clear what the actual questions are, without having to read through the comments section. – Jean-François Corbett Nov 13 '13 at 10:16
  • Have done that. Hopefully it's clear enough now. – GreggR Nov 13 '13 at 10:42

2 Answers2

0

Very good post, though no real question. ;) When writing code, conditional checks like IF hits the execution time real bad, but often it's the surest way to check. In your above code, every time it's used, the IF Not IsMissing condition is checked 5 times. This causes an exponential increase in load for every check you do (though I can't really tell you how much).

Without editing your code too much, one logic can be applied to skip 5 checks at a time. Instead of checking for existence, check for non-existence instead. Basically, your formula has an optional key2. If key2 does not exist, then key3... key6 won't either. Following this pattern, if key3 does not exist, key4... key6 won't either.

This gives us an immediate advantage. Surely, a variable check instead of five when you don't have other keys is a big leap. However, if you use it for 6 full keys at a time, I will be looking into an entirely different code. Evaluate is a massive killer, and if you're the type to make your UDFs recalculate (ie. Application.Volatile) every time, your calculation times will take an even more massive hit.

For the sake of showing a very minor change, here's my take on your code (UNTESTED):

Function KeyLookup(datatable As Variant, datacol As String, _
                   key1table As Variant, key1 As String, _
                   Optional key2table As Variant, Optional key2 As String, _
                   Optional key3table As Variant, Optional key3 As String, _
                   Optional key4table As Variant, Optional key4 As String, _
                   Optional key5table As Variant, Optional key5 As String, _
                   Optional key6table As Variant, Optional key6 As String) As Variant
    Dim cmd As String

    cmd = "INDEX(" & datatable.Address & ",MATCH(1,(" & key1table.Address & "=""" & key1 & """)"
    cmd2 = ",0)," & datacol & ")"

    If IsMissing(key2table) Then GoTo SkipOthers
    ElseIf IsMissing(key3table) Then
        cmd = cmd & "*(" & key2table.Address & "=""" & key2 & """)"
        GoTo SkipOthers
    ElseIf IsMissing(key4table) Then
        cmd = cmd & "*(" & key3table.Address & "=""" & key3 & """)"
        GoTo SkipOthers
    ElseIf IsMissing(key5table) Then
        cmd = cmd & "*(" & key4table.Address & "=""" & key4 & """)"
        GoTo SkipOthers
    ElseIf IsMissing(key6table) Then
        cmd = cmd & "*(" & key5table.Address & "=""" & key5 & """)"
        GoTo SkipOthers
    Else
        cmd = cmd & "*(" & key6table.Address & "=""" & key6 & """)"
    End If

SkipOthers:
    KeyLookup = Evaluate(cmd & cmd2)

End Function

Good luck if you want to continue down this path. :)

WGS
  • 13,969
  • 4
  • 48
  • 51
  • Thanks for your suggestion. As you can see in the comments above, I have modified my post to make the question more obvious. Your mod reduces the time by a small amount. I am needing to find fairly large improvement if possible and therefore am targeting the removal of Evaluate() which really slows the calculations down. I have proved this using other code. I actually don't think it's possible, but would be happy to be proved wrong. – GreggR Nov 15 '13 at 06:50
0

look into .NET (or alternative) for reconstituting the majority of the code into a external component. change the UDF to simply convert the range into an 2dimensional object array and pass it through. using the return value as the value to pass back to the return call for the UDF function.

Anything else will not yield a significant reduction in processing time.

This is a limitation of the in-process calculation engine. It works well with pure math, but working with excel objects or its functions (other than primitives) tends to take a massive hit. there is even a kb article on limitations of Excel UDFs. Worth a read.

Description of limitations of custom functions in Excel

Anonymous Type
  • 3,051
  • 2
  • 27
  • 45