2

I'm working with a two very large tables and I'm trying to use vba and vlookups to match the data in one with the other.

The worksheets are each one column and are structured like below:

Reference Sheet:
ID_Ref (2/4)
Q1234
W1234
R1234
...

Lookup Sheet
ID_Lookup
Q1234
P1234
X1234
...

I'm interested in identifying the rows that are in the Lookup SHeet but not in the Reference Sheet. I have a huge quantity of rows in each sheet (~130k in the Lookup and 90k in the Reference) and and excel kept crashing when I tried to enter the formulas manually.

As a result I'm trying to work with VBA to automate the process. The issue is that the below function returns 0 for most of the rows and I can't figure out why.

    Sub Vlookup()

    With Sheets("Lookup).Range("B3:B133780")
        .FormulaR1C1 = _
        "=IF(VLOOKUP(RC[-1],2_4!R1C1:R79145C1,1)=RC[-1],VLOOKUP(RC[-1],2_4!R1C1:R79145C2,1,FALSE),VLOOKUP(RC[-1],2_4!R1C1:R79145C2,1,FALSE))"
        .Value = .Value
    End With 
    End Sub

Any help is appreciated!

EDIT 5/13:

I tried the methods mentioned by @HarveyFrench and @Jeeped and got two different results.

Copying down:

=NOT(ISNA(VLOOKUP($A2,'2_4'!$A$1:$A$79145,1, FALSE)))

yields a different number of non-matching records than copying down

=IFERROR(MATCH(A2,'2_4'!$A$1:$A$79145,0),"")

Am I doing something wrong here?

RosenkranzC
  • 45
  • 1
  • 7
  • If it is just to identify the existance I would sooner use a countif rather than a cpu intensive VLookup. If the countif returns zero you know it doesn't exist, anything above zero it exists – Dan Donoghue May 12 '15 at 22:59
  • Depending on the type of data, you might be able to use `Data->Remove Duplicates` to reduce the size of it. This works if you're just testing for an item being in both lists. You could also call the VBA version of the functions `Application.Match` and process the results in VBA without putting the formula in the worksheet. `Application.Match` is ideal here since you just want to test existence. – Byron Wall May 12 '15 at 23:12
  • The [MATCH function](https://support.office.com/en-us/article/match-function-0600e189-9f3c-4e4f-98c1-943a0eb427ca)'s [IFERROR function](https://support.office.com/en-us/article/IFERROR-function-F59BACDC-78BD-4924-91DF-A869D0B08CD5) will return a *zero-length string* which while not truly blank can be counted with the [COUNTBLANK function](https://support.office.com/en-au/article/countblank-function-c72134a2-695e-43ab-a3f3-714aa770f385). Remove the IFERROR wrapper if you want to count #N/A with VLOOKUP's #N/A. –  May 13 '15 at 22:36

3 Answers3

2

Here is yoru formula, it looks dodgy!

'=IF(VLOOKUP(RC[-1],2_4!R1C1:R79145C1,1)=RC[-1]
    ,VLOOKUP(RC[-1],2_4!R1C1:R79145C2,1,FALSE)
    ,VLOOKUP(RC[-1],2_4!R1C1:R79145C2,1,FALSE))

In the cells in the column to the right of the lookup column with values (which I will assume is column A and start in row 2. Try this formula in cell B2 then copy and paste it down:

=NOT(  ISNA(   VLOOKUP($A2,'2_4'!$A$1:$A$79145,1, FALSE)   )   )

This will return true when a value is found or false when it isn't.

I think you need to re-read the manual pages on VLOOKUP as you not using it right.

You might like to use the ribbon>>formula tab>>calculation options to turn of auto calcualtion whilst you are editing, but I don;t think it will be necessary.

Harvey

HarveyFrench
  • 4,440
  • 4
  • 20
  • 36
  • Good point about the vlookup looking off Harvey, @RosenkranzC your formula is actually saying if the result of the vlookup is the cell to the left then return the result of the vlookup........ otherwise also return the result of the vlookup, whilst it is going to give you some semblance of what you want (ultimately a #N/A where none existent) you are running the vlookup for the test then again regardless of the outcome, just the vlookup on its own without the if is the same result with less calculations. – Dan Donoghue May 12 '15 at 23:20
2

A MATCH function is more efficient than a VLOOKUP function or even a COUNTIF function if all you want to do is prove existence.

With Sheets("Lookup).Range("B3:B133780")
    .Formula = "=iferror(match(a3, '2_4'!A:A, 0), """")"
    .cells = .value2
End With 

You will be left with blank cells and cells containing the row numbers of the match on worksheet '2_4' in column B.

  • Although match is better, your example won't work as the cell a3 is hard coded in the text string.... I wouldn't bother using VBA to set the formula, and I would use match which does less than vlookup and will be a bit quicker. vlookup is worth learning though as it is very handy. – HarveyFrench May 12 '15 at 23:27
  • 1
    @HarveyFrench - leaving A3 relative and filling a column of cells at once has hte same effect as a Fill Down; e.g. the A3 will adjust per row. –  May 12 '15 at 23:29
  • Glad I am not the only one to have questioned you on this now Jeeped :). – Dan Donoghue May 12 '15 at 23:48
  • Oooo. I didn't know that. A good tip, although I suspect obmitting the line ".cells = .value2" and leaving the formulas in place would be how RosenkranzC might ue it. – HarveyFrench May 13 '15 at 07:33
  • After a little bit of fiddling, this was the solution with the least amount of processing time. Thank you everyone for your help! – RosenkranzC May 13 '15 at 22:05
1

Here's a different approach using ADO (adapted from this answer but modified to use the Excel 2007 onwards connection string and to identify only the items which are unique to the lookup sheet.) I've assumed that the lookup sheet is called "Lookup", the reference sheet is called "Reference" and the list of items we identify will be output to a sheet called "Data" - replace with the correct names if these are wrong.

In the VBA editor, go to Tools > References and add a reference to "Microsoft ActiveX Data Objects 2.8 Library". Now add this to a standard code module then run it:

Option Explicit

Sub get_employees()

Dim cn As ADODB.Connection
Set cn = New ADODB.Connection

' Connection string for .xlsm files
With cn
    .Provider = "Microsoft.ACE.OLEDB.12.0"
    .ConnectionString = "Data Source=" & ThisWorkbook.FullName & ";" & _
        "Extended Properties=""Excel 12.0 Macro;IMEX=1;HDR=Yes"";"
    .Open
End With

Dim rs As ADODB.Recordset
Set rs = New ADODB.Recordset

rs.Open "SELECT [Lookup$].[ID_Lookup] FROM [Lookup$] LEFT JOIN [Reference$] ON [Lookup$].[ID_Lookup] = " & _
    "[Reference$].[ID_Ref] WHERE [Reference$].[ID_Ref] IS NULL", cn

With Worksheets("Data")
    .Cells(2, 1).CopyFromRecordset rs
End With

rs.Close
cn.Close

End Sub

This tries to match every entry in the lookup sheet with one or more corresponding entries in the reference sheet. It then eliminates all of the matches and just keeps the lookup entries which didn't match.

Community
  • 1
  • 1
barrowc
  • 10,444
  • 1
  • 40
  • 53