0

Every week, my analysts have a spreadsheet of invoices which they need to update with a check number and check date. The checks table exists in SQL server.

I've written them a macro that iterates through each row of the spreadsheet, opens an ADO recordset using a statement like this:

SELECT CheckNumber, CheckDate FROM CHECKS WHERE Invoice_Number = " & cells (i,2)

... and then uses the fields from the recordset to write the number and date to the first two columns of that row in the Excel spreadsheet.

The code performs acceptably for a few hundred rows, but is slow when there are thousands of rows.

Is there a faster way to update an Excel spreadsheet than with a row-by-row lookup using ADO? For example, is there a way to do a SQL join between the spreadsheet and the table in SQL Server?

Edit: In response to Jeeped's questions, here's a bit of clarification.

What I'm really trying to do is find a way to "batch" update an Excel spreadsheet with information from SQL server, instead executing SQL lookups and writing the results a row at a time. Is there a way to do the equivalent of a join and return the entire results set in a single recordset?

The Invoice example above really represents a class of problems that I encounter daily. The end users have a spreadsheet that contains their working data (e.g. invoices) and they want me to add information from a SQL server table to it. For example, "Using the invoice number in column C, add the check number for that invoice in column A, and the check date in column B". Another example might be "For each invoice in column b, add the purchase order number to column a."

The Excel source column would be either a number or text. The "match" column in the SQL table would be of a corresponding data type, either varchar or integer. The data is properly normalized, indexed, etc. The updates would normally affect a few hundred or thousand rows, although sometimes there will be as many as twenty to thirty thousand.

If I can find a way to batch rows, I'll probably turn this into an Excel add-in to simplify the process. For that reason, I'd like to stay in VBA because my power users can extend or modify it to meet their needs--I'd rather not write it in a .NET language because then we need to dedicate developer time to modifying and deploying it. The security of the Excel application is not a concern here because the users already have access to the data through ODBC linked tables in an MS Access database and we have taken appropriate security precautions on the SQL Server.

Moving the process to SSIS would require a repeatability that doesn't exist in the actual business process.

DataWriter
  • 1,010
  • 1
  • 10
  • 25
  • 1
    Your problem is the repeated calls for individual records. Pull all the records at once into a variant array and process the worksheet from the memory block. –  May 03 '15 at 20:44
  • That's the solution I'm looking for--batch processing, but I don't understand how to send the thousands of requests to SQL server in a batch. – DataWriter May 03 '15 at 21:17
  • You're going to have to provide more in the way of specifics before an answer can be offered. Are all of the invoice numbers in the worksheet processed or is there a subset? What column are the invoice numbers in? Are they numeric or alphanumeric? What other fields do you need to update information based on invoice number? How many records is a typical update.. 1K, 10K, 100K? I could go on but you can see how your current question lacks many important factors. –  May 03 '15 at 21:18
  • Not an answer to your question, but I hope you don't wind up with `x; DROP TABLE CHECKS;` in a cell somewhere... See [Non-web SQL Injection](http://stackoverflow.com/questions/512174/non-web-sql-injection). – Comintern May 03 '15 at 21:26

2 Answers2

1

In the past I've had success with pulling in all of the data from SQL server into a client side disconnected ADO recordset. I then looped once through the entire recordset to create a VBA dictionary storing the ID Value (in this case the InvoiceNum) as key, and the recordset bookmark as the pair item. Then loop though each value checking the invoice number against the dictionary using the "Exists" function. If you find a match you can set your recordset to the bookmark and then update the values on the spreadsheet from the recordset. Assuming the Invoice table isn't a few million rows this method should prove speedy.

EDIT: Added batch processing to try to limit returned records from large datasets. (Untested Code Sample)

Public Sub UpdateInvoiceData(invoiceNumRng As Range)
'References: Microsoft ActiveX Data Objects x.x
'References: Microsoft Scripting Runtime

    Dim cell As Range
    Dim tempCells As Collection
    Dim sqlRS As ADODB.Recordset
    Dim dict As Scripting.Dictionary
    Dim iCell As Range
    Dim testInvoiceNum As String
    Dim inClause As String
    Dim i As Long

    i = 1

    For Each cell In invoiceNumRng

        If i Mod 25 = 0 Or i = invoiceNumRng.cells.Count Then 'break up loop into batches of 25:: Modify batch size here, try to find an optimal size.

            inClause = CreateInClause(tempCells) 'limit sql query with our test values
            Set sqlRS = GetInvoiceRS(inClause) 'retrieve batch results
            Set dict = CreateInvoiceDict(sqlRS) 'create our lookup dictionary

            For Each iCell In tempCells

                testInvoiceNum = iCell.Value 'get the invoice number to test

                If dict.Exists(testInvoiceNum) Then 'test for match

                    sqlRS.Bookmark = dict.Item(testInvoiceNum) 'move our recordset pointer to the correct item
                    iCell.Offset(0, 1).Value = sqlRS.Fields("CheckNum").Value
                    iCell.Offset(0, 2).Value = sqlRS.Fields("CheckDate").Value

                End If
            Next iCell

            'prepare for next batch of cells
            Set tempCells = Nothing
            Set tempCells = New Collection
        Else

            tempCells.Add cell
        End If

        i = i + 1 'our counter to determine batch size
    Next cell


End Sub

Private Function CreateInClause(cells As Collection) As String

    Dim retStr As String
    Dim tempCell As Range

    retStr = ""

    For Each tempCell In cells
        retStr = retStr & "'" & tempCell.Value & "'" & ", " 'assumes textual value, omit single quotes if numeric/int
    Next tempCell

    If Len(retStr) > 0 Then
        CreateInClause = Left(retStr, Len(retStr) - 2) 'trim off last comma value
    Else
        CreateInClause = "" 'no items
    End If
End Function

Private Function GetInvoiceRS(inClause As String) As ADODB.Recordset
'returns the listing of InvoiceData from SQL

    Dim cn As ADODB.Connection
    Dim rs As ADODB.Recordset
    Dim sql As String

    Set cn = New ADODB.Connection
    cn.ConnectionString = "Your Connection String"

    sql = "SELECT * FROM [Invoices] WHERE InvoiceID IN(" & incluase & ")"

    cn.Open

    rs.CursorLocation = adUseClient 'use clientside cursor since we will want to loop in memory
    rs.CursorType = adOpenDynamic

    rs.Open sql, cn

    Set rs.ActiveConnection = Nothing 'disconnect from connection here

    cn.Close

    Set GetInvoiceRS = rs
End Function

Private Function CreateInvoiceDict(dataRS As ADODB.Recordset) As Dictionary

    Dim dict As Scripting.Dictionary

    Set dict = New Scripting.Dictionary

    If dataRS.BOF And dataRS.EOF Then
        'no data to process
    Else
        dataRS.MoveFirst 'make sure we are on first item in recordset
    End If

    Do While Not dataRS.EOF

        dict.Add CStr(dataRS.Fields("InvoiceNum").Value), dataRS.Bookmark

        dataRS.MoveNext
    Loop

    Set CreateInvoiceDict = dict
End Function
Fink
  • 3,356
  • 19
  • 26
  • Thank you, this is a good idea that I can definitely use! Unfortunately, in this case, my check and PO tables have millions of rows :( – DataWriter May 04 '15 at 17:20
  • Hmmm, a bit more tricky with that large of a dataset. You can try to adapt this to run in batches of maybe 25-50 items, generating a dynamic SQL 'IN' clause to try to limit the amount of rows returned. I'll try to get an edit to my answer to demonstrate. – Fink May 05 '15 at 15:36
  • Thank you. I really like your approach and I've been thinking about ways to batch and limit the data. If I can use the dates, suppliers, etc. in the spreadsheet I can definitely bring back smaller recordsets from the check table. I'm going to experiment with it later today, if I can. – DataWriter May 05 '15 at 15:39
  • added a new sample, hopefully that will help you out, Goodluck! – Fink May 05 '15 at 15:57
  • I haven't had a chance to test this extensively but I like the approach, so I'm marking it solved. Thank you for your contribution! – DataWriter May 15 '15 at 13:29
0

The best way to do this is to use SSIS and insert the information (through SSIS) into a range in the spreadsheet. Remember that SSIS expects the target range to be empty and one row above the target range should also be empty. If you do this you can schedule the SSIS job through the windows scheduler.

benjamin moskovits
  • 5,261
  • 1
  • 12
  • 22