1

I have the following procedure that queries a database for an extremely large dataset and writes it to a text file (the resulting .txt file should be, roughly, 2-3gb in size).

Sub DBExecuteQueryWriteToFile(ByVal strSQLQuery As String, ByVal FileName As String, Optional ByVal Delimiter As String = ",")

    Dim cmd = New OracleCommand(strSQLQuery, conn)
    Dim buffersize As Integer = 1024 * 1024

    Dim x As Integer = 0
    Dim CountRow As Integer = 0

    If File.Exists(FileName) Then
        File.Delete(FileName)
    End If

    Using FileObject As New FileStream(FileName, FileMode.OpenOrCreate, FileAccess.Write, FileShare.None, buffersize)
        Using StreamWriterObj As New StreamWriter(FileObject)
            Using Reader As OracleDataReader = cmd.ExecuteReader()
                Dim FieldCount As Integer = Reader.FieldCount - 1

                ' Write the headers
                StreamWriterObj.Write(Reader.GetName(0))
                For i = 1 To FieldCount
                    StreamWriterObj.Write(Delimiter)
                    StreamWriterObj.Write(Reader.GetName(i))
                Next
                StreamWriterObj.WriteLine()


                ' Write the file
                Do While Reader.Read()
                    StreamWriterObj.Write(Reader.Item(0))
                    For i = 1 To FieldCount
                        StreamWriterObj.Write(Delimiter)
                        StreamWriterObj.Write(Reader.Item(i))
                    Next
                    StreamWriterObj.WriteLine()
                Loop

            End Using
        End Using
    End Using

End Sub

Now my problem is that when I try and use it on a smaller dataset, it writes the text file perfectly, when I try and use this to write the large dataset, it takes a very long time (as I would expect) and then it ends up with a 0kb file.

I originally thought the problem had to do with querying such a large dataset and, so, I posted this question, but then was told that wasn't the issue. So I don't know if this is a limitation on windows file sizes (I'm using windows 7, 64-bit with 16GB RAM) or if my code is bad somewhere else.

I know I can accomplish my end result by breaking the query down to get smaller datatsets and to write each one to a separate file, but I would also like to know if my code somewhere is incorrect / could be made to work better.

Any thoughts / insights would be greatly appreciated. Also, even though this is VB code, I'm equally comfortable with VB / C# as a solution.

Thanks!!!

Community
  • 1
  • 1
John Bustos
  • 19,036
  • 17
  • 89
  • 151
  • 1
    So I can't verify this because I just don't have a good way of recreating what you've got, but I think you're getting a `0 byte` file because the buffer isn't getting flushed, ever. I think you have a couple of choices, either set `AutoFlush` to `true` (not recommended), or `Flush()` after a certain number of records (every 10) for example. You'd have to mess with how often to get it just right. You may need to use `AutoFlush` if you're using `UTF8` however (http://msdn.microsoft.com/en-us/library/system.io.streamwriter.flush(v=vs.110).aspx). – Mike Perrenoud Feb 24 '14 at 16:33
  • Thanks, @MichaelPerrenoud, never knew about flushing the buffer!! - I'm looking into it now... – John Bustos Feb 24 '14 at 16:37
  • Just set the `Autoflush` to `True`... I'm hoping that does the trick, thanks, @MichaelPerrenoud!! – John Bustos Feb 24 '14 at 16:51
  • It's still runnning, but I think this is, at the very least, a VERY worthwhile part of the solution - would you like to post it as a solution.. If no one else posts any other glaringly obvious fixes, I'd like to award you the points (if they at all matter to you :) ) – John Bustos Feb 24 '14 at 17:19
  • I have no problem adding it as the solution if it works, I just want to make sure it actually worked. Let me know when it completes and you've verified it. Keep in mind that if you use `Flush()` on a regular basis, vs. `AutoFlush`, you *may* be able to find better performance; but it's a trade-off because you can only keep so much data in the buffer before flushing and so you could risk losing data. It really depends on whether or not you even need to optimize for speed. – Mike Perrenoud Feb 24 '14 at 17:25
  • Does it make sense to you that the code is simply stuck at the `Using Reader As OracleDataReader = cmd.ExecuteReader()` line?? I know it's a large dataset, but it's been stuck there longer than I would have anticipated... any thoughts?? – John Bustos Feb 24 '14 at 18:40
  • 1
    I honestly don't know the internals of what has to complete in `ExecuteReader` before it can continue. It's conceivable that the query has to complete entirely and then the result set is simply streamed to the client one row at a time. This would almost certainly be different for each technology as well (e.g. `Microsoft SQL` vs. `Oracle`). Now, IMO, it seems like if it's stuck there it's actually trying to complete the query. – Mike Perrenoud Feb 24 '14 at 18:42
  • Thank you SO much! - I'm going to look into that now... Nonetheless, it only makes sense that I'd have to flush the stream too... Please do post it as an answer and I'll post a new question specifically regarding the DataReader. Thank you so much again!!! – John Bustos Feb 24 '14 at 18:47

0 Answers0