3

I am trying to do some text replacement in vbscript. The problem is that the file is 150MB big. Below is the function which just deletes the header rows and gets rid of blank lines.

As I had suspected that it is giving me an error "out of memory" at line strContents = objTS.ReadAll in cleanHeaderRow sub. At this moment I am not sure if this task can be done in VBScript or not. But before I start exploring other languages any kind of suggestion would be much welcomed.

Sub cleanHeaderRow(browse)
    MsgBox browse
    Const FOR_READING = 1
    Const FOR_WRITING = 2
    'strFileName = "C:\scripts\test.txt"
    strFileName = browse
    iNumberOfLinesToDelete = 1

    Set objFS = CreateObject("Scripting.FileSystemObject")
    Set objTS = objFS.OpenTextFile(strFileName, FOR_READING)
    strContents = objTS.ReadAll
    objTS.Close

    arrLines = Split(strContents, vbNewLine)
    Set objTS = objFS.OpenTextFile(strFileName, FOR_WRITING)

    For i=0 To UBound(arrLines)
        If i > (iNumberOfLinesToDelete - 1) Then
            objTS.WriteLine arrLines(i)
        End If
    Next
End Sub

Sub DeleteBlankRows(browse)
    Const ForReading = 1
    Const ForWriting = 2
    Set objFSO = CreateObject("Scripting.FileSystemObject")
    Set objFile = objFSO.OpenTextFile(browse, ForReading)
    Do Until objFile.AtEndOfStream
        strLine = objFile.Readline
        strLine = Trim(strLine)
        If Len(strLine) > 0 Then
            strNewContents = strNewContents & strLine & vbCrLf
        End If
    Loop
    objFile.Close
    Set objFile = objFSO.OpenTextFile(browse, ForWriting)
    objFile.Write strNewContents
    objFile.Close
End Sub
Ansgar Wiechers
  • 193,178
  • 25
  • 254
  • 328
imba22
  • 651
  • 1
  • 13
  • 25
  • How many lines are there in the text file? – Siddharth Rout Aug 17 '16 at 18:02
  • Excel has 1048576 rows so you may want to try open it in Excel? – Siddharth Rout Aug 17 '16 at 18:03
  • Its not an excel file. Its pipe separated file which has 260,000 rows and each row has 55 columns. – imba22 Aug 17 '16 at 18:16
  • Import as csv with pipe delimiter? – Raystafarian Aug 17 '16 at 18:41
  • @ Raystafarian: Cannot do that. The whole point of writing this question is that I need to do it programmatically. I understand it can be done trivially via excel. Need to do it at click of button which only accepts a pipe delimitted file with 250,000 rows – imba22 Aug 17 '16 at 18:51
  • 6
    How about simply reading the file one line at a time and writing one line at a time. That way you only have to store and process a single line in memory and you should never hit any limits. – matt_black Aug 17 '16 at 19:27
  • 1
    Thanks @Matt_black. Thats what I ended up doing – imba22 Aug 17 '16 at 19:30
  • 2
    Agree with @matt_black. Trying to call `ReadAll()` is just nuts on very large files, it's called a `TextStream` object for a reason. Just call `ReadLine()` and process each line at a time. Also don't store and concatenate each line into a variable or you will have the same issue. – user692942 Aug 17 '16 at 19:31
  • 1
    @Lankymart It is worth noting that you can process **huge** text files easily one line at a time. I've managed CSV files with millions of rows and 50,000 columns (both are bigger than a single excel sheet can handle). – matt_black Aug 17 '16 at 19:39
  • @matt_black absolutely which is kind of what I was getting at being called a "stream" object, it's designed for that very purpose. – user692942 Aug 17 '16 at 21:56
  • a good old-fashioned method to do the job: `more +1 file.csv|findstr /R /V /C:"^[ ]*$" > file.tmp && move /y file.tmp file.csv>nul` – Kul-Tigin Aug 18 '16 at 02:15
  • `@ Raystafarian: Cannot do that. The whole point of writing this question is that I need to do it programmatically.` Of course :) So instead of opening the file iusing FSO, why not open it in excel programatically and do what you want? – Siddharth Rout Aug 18 '16 at 03:20

1 Answers1

1

Both your procedures read the entire file into memory, even though they're doing it in different ways. The correct approach to processing large files is to open the file, process it line by line, writing the output to a temporary file as you read the input, then replace the original file with the temp file.

Example:

filename = "..."

Set fso = CreateObject("Scripting.FileSystemObject")

Set inFile  = fso.OpenTextFile(filename)
Set outFile = fso.OpenTextFile(filename & ".tmp", 2, True)
Do Until inFile.AtEndOfStream
  If inFile.Line = 1 Then
    inFile.SkipLine                            'skip first line
  Else
    line = inFile.ReadLine
    If line <> "" Then outFile.WriteLine line  'write non-empty lines to output
  End If
Loop
inFile.Close
outFile.Close

fso.DeleteFile filename, True
fso.MoveFile filename & ".tmp", filename

That way you have only one line in memory at any given time. You trade memory usage for disk usage, though, since you create a second file for the output.

Ansgar Wiechers
  • 193,178
  • 25
  • 254
  • 328