0

I created a macro enabled Excel file that performs the following actions (high level):

  1. User selects template file (which is itself macro enabled) via file dialog
  2. User selects data files (which are not macro enabled) also via file dialog
  3. Macro steps through data files and, one by one, opens them, formats the data, migrates the data into a new worksheet in an intermediary workbook, and then closes the data file without saving it
  4. Once all files have been looped through, the intermediary workbook is also saved, but kept open
  5. Once all the data files have been looped through, each sheet of the intermediary workbook is looped through, the data in the current worksheet is transferred to the template file, and the template file is saved as a new, uniquely labeled file. One row of data in this now-data-containing file is copied into a summary sheet

The number of data files being selected are in the thousands (so far the biggest run we've attempted is 4000 files). As the macro progresses, the time it takes for these files to save gets slowly but steadily longer. It starts at about five seconds, but by the end some of the files are taking about five minutes to save.

I added an iteration feature that, once all the data files have been looped through, it completely closes the template file and opens a new instance of it with different settings, and then starts the process over again.
This causes the save time to go back to normal.
The summary file is also saved and closed during this step, and a new one opened for the new iteration.

I've considered closing and reopening the template file every hundred data files or so, but I'd rather get a proper solution.
If I open and close the template file every time, I avoid the time problem, but then the macro becomes wildly unstable, where it will sometimes crash at random points.

This is on a computer that is isolated from the internet or any kind of network and is saving to a solid state drive (we've tried to control for quite a few variables).

Option Explicit

Public Sub Example()
    Dim Trial As Integer, Trials As Integer, DataSet As Integer
    Dim TrialChecker As Boolean
    Dim StartTime As Double, WaitTime As Double
    Dim StartDate As Date
    Dim FileSaveName As String
    Dim CopiedDataRange As Range
    Dim SummaryRunTimes As Worksheet, Calcs As Worksheet, CutoffsShifts As Worksheet
    Dim SheetObjects() As Worksheet
    Dim IntermediaryWorkbook As Workbook, Summary As Workbook, Template As Workbook
    
    Application.ScreenUpdating = False
    Application.Calculation = xlCalculationManual
    
    'The 1 and Trials are actually set by Lbound and Ubound funcitons, but the premise is the same
    For Trial = 1 To Trials
        Workbooks.Add
        Set Summary = ActiveWorkbook
        'I use this one sheet to keep track of how long different parts of the code take to run
        Set SummaryRunTimes = Summary.Worksheets(1)
        SummaryRunTimes.Name = "Run Times"
        SummaryRunTimes.Cells(1, 1).Value = "ID"
        SummaryRunTimes.Cells(1, 2).Value = "Data Copy Time (s)"
        SummaryRunTimes.Cells(1, 3).Value = "Formula Copy and Calc Time (s)"
        SummaryRunTimes.Cells(1, 4).Value = "Summary Copy Time (s)"
        SummaryRunTimes.Cells(1, 5).Value = "Save and Cleanup Time (s)"
        
        'sheetnames is defined elsewhere in the code (it's a global variable right now. I intend to change that later).
        'It's simply an array of strings with six elements.
        For Counter = LBound(sheetnames) To UBound(sheetnames)
            Summary.Worksheets.Add
            Summary.ActiveSheet.Name = sheetnames(Counter)
        Next Counter
        
        'Again, TemplateLocation is defined elsewhere. It's just a string grabbed from a filedialog
        Workbooks.Open (TemplateLocation)
        Set Template = ActiveWorkbook
        Set Calcs = Template.Sheets("Calcs")
        Set CutoffsShifts = Template.Sheets("Log Cutoffs & Shifts")
        
        'SheetObjects is simply used as a convenient reference for various sheets in the template file. I found
        'it cleaned up the code a bit. Some might say it's unnecessary.
        For Counter = LBound(sheetnames) To UBound(sheetnames)
            Set SheetObjects(Counter) = Template.Sheets(sheetnames(Counter))
        Next Counter
        
        'This is where the parameters for the given trial are set in the template file. Trialchecker is set elsewhere
        '(it checks a yes/no dropdown in the original spreadsheet). ParameterAddresses is a range that's grabbed from a
        'table object in the original spreadsheet and contains where these parameters go in the template file. These
        'will not change depending on the trial, thus column = 1. TrialParameters is in the same table, and are the
        'parameters themselves. These DO depend on the trial, and thus the column is equal to the trial number
        If TrialChecker = True Then
            For Counter = LBound(ParameterAddresses) To UBound(ParameterAddresses)
                CutoffsShifts.Range(ParameterAddresses(Counter, 1)).Value = TrialParameters(Counter, Trial)
            Next Counter
        End If
        
        For DataSet = 1 To IntermediaryWorkbook.Worksheets.Count - 1
            'This is where I start my timers
            StartTime = Timer
            StartDate = Date
            
            'This is where the data is actually copied from the intermediary file into the template. It's always five
            'columns wide, but can be any number of rows. the SummaryRunTimes statement is merely grabbing the unique
            'identifier of that given worksheet
            With IntermediaryWorkbook
                Set CopiedDataRange = Calcs.Range("$A$3:$E$" & .Worksheets(Counter).UsedRange.Rows.Count + 1)
                CopiedDataRange.Value = IntermediaryWorkbook.Worksheets(Counter).Range("$A$2:$E$" & .Worksheets(Counter).UsedRange.Rows.Count).Value
                SummaryRunTimes.Cells(Counter + 1, 1) = Calcs.Cells(3, 1).Value
            End With
            
            'First timestamp
            SummaryRunTimes.Cells(Counter + 1, 2) = CStr(Round(86400 * (Date - StartDate) + Timer - StartTime, 1))
            StartTime = Timer
            StartDate = Date
            
            'This statement copies down the formulas that go with the data (which is aobut 100 columsn worth of formuals).
            'Throughout this process, calculation is set to manual, so calculation is manually triggered here (Don't ask
            'me why I do it twice. If I recall, it's because pivot tables are weird)
            Set CopiedFormulaRange = Calcs.Range("$F$3:$KL$" & Calcs.UsedRange.Rows.Count)
            CopiedFormulaRange.FillDown
            Application.Calculate
            Template.RefreshAll
            Application.Calculate
            
            'Second timestamp
            SummaryRunTimes.Cells(Counter + 1, 3) = CStr(Round(86400 * (Date - StartDate) + Timer - StartTime, 1))
            StartTime = Timer
            StartDate = Date
            
            'This is a separate function that copies data from the template file into the summary sheet.
            'I know you can't see the code, but just know that it only copies six sets of seven cells, so
            'as far as I can tell, it's not what is causing the problem. The timestamp supports this idea, as
            'it's consistent and short
            Call SummaryPopulate(Summary, sheetnames, SheetObjects, r)
            r = r + 1
            
            'Third timestamp
            SummaryRunTimes.Cells(Counter + 1, 4) = CStr(Round(86400 * (Date - StartDate) + Timer - StartTime, 1))
            StartTime = Timer
            StartDate = Date
            
            'These following few lines are meant to save the template file as a new file. As I mentioned, this is where
            'things get bogged down. FileNameSuffix is a string set via a InputBox. TrialNames is set via the table object
            'mentioned above, and is an array of strings.
            Application.DisplayAlerts = False
            
            If TrialChecker = True Then
                FileSaveName = FolderLocation & "\" & Replace(Calcs.Cells(3, 1).Value, "/", " ") & " OOIP " & FileNameSuffix & " - " & TrialNames(1, Trial) & ".xlsm"
            Else
                FileSaveName = FolderLocation & "\" & Replace(Calcs.Cells(3, 1).Value, "/", " ") & " OOIP " & FileNameSuffix & ".xlsm"
            End If
            
            Template.SaveAs Filename:=FileSaveName, ConflictResolution:=xlLocalSessionChanges
    
            Application.DisplayAlerts = True
            
            'This part clears the copied data and formulas. I added the two Set Nothing lines in the hopes that it would
            'solve my problem, but it doesn't seem to do anything
            CopiedDataRange.ClearContents
            CopiedDataRange.Offset(1, 0).Rows.Delete
            Set CopiedDataRange = Nothing
            Set CopiedFormulaRange = Nothing
            
            'Fourth and final timestamp
            SummaryRunTimes.Cells(Counter + 1, 5) = CStr(Round(86400 * (Date - StartDate) + Timer - StartTime, 1))
            
            'It seems to run a bit better if there's this Wait line here, but I'm not sure why. The WaitTime
            'is grabbed from the original worksheet, and is a Double
            Application.Wait (TimeSerial(Hour(Now()), Minute(Now()), Second(Now()) + WaitTime))
            
        Next DataSet
        
        'This but simply saves the summary file and then closes that and the template file. Then the process starts anew.
        'This seems to be the key for resetting something that reduces the run times.
        If TrialChecker = True Then
            Summary.SaveAs Filename:=FolderLocation & "\" & "OOIP Summary " & FileNameSuffix & " - " & TrialNames(1, Trial) & ".xlsx"
        Else
            Summary.SaveAs Filename:=FolderLocation & "\" & "OOIP Summary " & FileNameSuffix & ".xlsx"
        End If
        Template.Close False
        Summary.Close False
    Next Trial
    
    Application.ScreenUpdating = True
    Application.Calculation = xlCalculationAutomatic
    
    IntermediaryWorkbook.Close False
End Sub
Community
  • 1
  • 1
apynn
  • 11
  • 3
  • In your work flow you don't have a step where you save the data files. Are you actually copying data from the data files into the template file (one template file, many data files)? If so, the only saving that takes place would be of the template file, after each data file has been imported. Do you close the data file after import? Perhaps you could review your question above to make it more clear. I also think it will be hard to make anything more than a guess without seeing the code. – Variatus Jul 28 '17 at 03:43
  • How does the Memory looks when you are doing so many Files? How does your Task Manager Lock Like? are there than hundreds of Excel Files Open, or does your Template Excel increases his Memory? – Moosli Jul 28 '17 at 07:01
  • @Variatus You're right, I apologize. I've edited my question to try and provide more detail. And I can include my code, but I'd rather avoid it if possible. I'd have to cut out quite a bit for it to make any sense in this context. What I can say that might be useful before getting to that step is that I am constantly setting and resetting range and worksheet variables. I'm not sure if this makes a difference at all, but I've read that it can clutter up the memory a bit (I tried setting them to Nothing after each use, but this does not seem to work). – apynn Aug 03 '17 at 05:39
  • The iteration at the end which you suspect of causing the problem isn't clear. Why do you need to repeat "the process all over again"? Surely, by this time reference to the data files shouldn't be needed because you (should) have all you want from them in the intermediate workbook. – Variatus Aug 03 '17 at 10:13
  • @Variatus I've attempted to summarize my code as best I can without leaving out anything important, and included comments to help make reading it easier. Hopefully this will help me explain myself. If it's not clear, please say what is not making sense and I'll be sure to fix it. Thanks for your patience! – apynn Aug 09 '17 at 03:25

1 Answers1

0

Sorry to post this as an answer, which it isn't, but I need a bit of space here. I went through your code, found that the IntermediateWorkbook isn't defined and decided that defining it wouldn't make a difference. I feel certain that you have done all I might think of doing and my study of your code will not discover anything you haven't already discovered. Therefore I look for a solution in first separating the processes and then joining them up again in a different way - or perhaps not. This is the key to my "solution": if the parts can't be joined, let them run separately. Therefore the task I set is to create separate parts.

Part 1 This is described in your points 2 to 4, i.e. creation of the Intermediate workbook. You haven't stated why the user must select a template before that workbook is created but if this has some bearing that template can be opened and closed. The important part of my suggestion is to end the process when the Intermediate workbook is saved. Close it. Close the template. And the project is done - part 1.

Part 2 Open the Intermediate file and loop through its data, creating new files. Each of these files is based on a template. You may have to provide code to enable selection of the correct template if there are several to choose form and if the data in the Intermediate workbook don't support automatic selection. In this process you only have the Intermediate workbook open plus one new file at a time. Each file is closed before a new one is created. At the end of this process the Intermediate file is closed as well. (BTW it occurs to me that your handling of the template might be the cause of your problem. In my process description the template is never opened. Instead, new workbooks are created based on it, and that is the inventor's design.)

Part 3 Create or open the Summary file. Open each of the newly created workbooks and copy one row into the Summary. Then close each workbook and open the next. At the end of the process close the Summary workbook.

Joining the parts: Frankly, I would try to integrate part 3 into part 2 from the outset. I don't believe that having one extra workbook open would make a difference. But if it does then split the tasks.

Your two or three separate procedures should be in an Add-in perhaps or a workbook that does nothing but hold the code (adding one more open workbook to the two or three others - something Excel can handle easily). To the code in this workbook you add a sub which calls the two or three procs in turn, one after the other.

In this program structure your problem might resurface in Part 2 when it might take progressively more time to save each new workbook. If that happens the nature of the problem will have shifted and should be easier to understand and, hopefully, easier to solve.

Variatus
  • 14,293
  • 2
  • 14
  • 30