1

I have a simple program that reads a .txt file, and then splits it up into many files of "pMaxRows" number of rows. These .txt files are huge - some are nearly 25Gb. Right now it is not running fast enough for my liking, I feel that there should be a way to improve the efficiency by maybe reading/writing multiple lines at once, but I am not very experienced with vb.net streamreader/streamwriter.

Code is below:

Public Sub Execute(ByVal pFileLocation As String, _
                   ByVal pMaxRows As Int32)

    Dim sr As IO.StreamReader
    Dim Row As String
    Dim SourceRowCount As Int64
    Dim TargetRowCount As int64
    Dim TargetFileNumber As Int32
    ''Does the file exist in that location?
    If IO.File.Exists(pFileLocation) = False Then
        Throw New Exception("File does not exist at " & pFileLocation)
    End If

    ''Split FileLocation into FileName and Folder Location
    Dim arrFileLoc() As String = pFileLocation.Split("\")
    Dim i As Integer = arrFileLoc.Length - 1
    Dim FileName As String = arrFileLoc(i)
    Dim FileLocationLength As Integer = pFileLocation.Length
    Dim FileNameLength As Integer = FileName.Length
    Dim Folder As String = pFileLocation.Remove(FileLocationLength - FileNameLength, FileNameLength)



    ''Read the file
    sr = New IO.StreamReader(pFileLocation)
    SourceRowCount = 0
    TargetRowCount = 0
    TargetFileNumber = 1


    ''Create First Target File Name
    Dim TargetFileName As String
    TargetFileName = TargetFileNumber & "_" & FileName

    ''Open streamreader and start reading lines
    Do While Not sr.EndOfStream


        ''if it hits the target number of rows: 
        If (TargetRowCount = pMaxRows) Then

            ''Advance target file number
            TargetFileNumber += 1
            ''Create New file with target file number
            TargetFileName = TargetFileNumber & "_" & FileName

            ''Set target row count back to 0
            TargetRowCount = 0

        End If
        ''Read line
        Row = sr.ReadLine()

        ''Write line
        Using sw As New StreamWriter(Folder & TargetFileName, True)
            sw.WriteLine(Row)
        End Using

        SourceRowCount += 1
        TargetRowCount += 1

    Loop
End Sub

Anyone have any suggestions? Even directing me to the right place if this has been answered before would be much appreciated

T.S.
  • 18,195
  • 11
  • 58
  • 78
Bryan
  • 67
  • 1
  • 1
  • 8
  • Probably should be asked at [Code Review](http://codereview.stackexchange.com/) – LarsTech Oct 21 '15 at 19:16
  • 1
    If accumulating several lines (start with, say, Math.Min(pMaxRows, 1000)) to write into a StringBuilder and writing those doesn't speed it up enough, go to Code Review as LarsTech suggested. – Andrew Morton Oct 21 '15 at 19:22
  • 1
    It looks like you are opening the `StreamWriter` for every row written - try only opening it once for each output file. – Mark Oct 21 '15 at 20:08
  • You can read your file, certain number of lines (or number of characters/bites) at the time and pass these lines to a new thread, and do the writing on the different thread. Not to overwhelm your machine, monitor to use only 10 threads at the time, for example. – T.S. Oct 21 '15 at 20:44
  • I'd try reading/writes of blocks of bytes instead of lines: read 10MB block into stringbuilder, find next line break byte by byte, write 10MB+ remainder of last line block, repeat. Easier if records are fixed length. Depends on the criteria for splitting the blocks of course. – rheitzman Oct 21 '15 at 21:04
  • Mark - Good call, I will try that, didn't see that inefficiency. – Bryan Oct 21 '15 at 21:57
  • rheitzman - I thought about this, but unfortunately there can be multiple record types of varying sizes, it has to be line by line Andrew - so I should read lines one at a time into a string builder than write the block? – Bryan Oct 21 '15 at 22:14
  • @Bryan Yes. And don't forget to write the final block ;) If you can arrange to write the files to a physically different local drive that might help too. – Andrew Morton Oct 22 '15 at 09:17

0 Answers0