0

I have a program that reads in a folder of files (4GB all together), indexes them, then adds them to a linked SQL database. This all occurs before the form even loads.

Now I wish to have another file containing one 13 digit number per line and query the SQL database from VB searching for each number to return which folder, file, and line number contains that number.

The method I have below works now, but it's slow. I need to find a much quicker way to go straight to the specific line in a file for my number and write a new line of text at that location.

Dim result = From n In System.IO.File.ReadLines("G:\USER\SearchThese.txt")
                     Select n.Substring(0, 13)


Dim MyFilePath As String
Dim linePos As String
Dim lines As String
'-- connection
Dim con As New SqlConnection(***MY SQL CONNECTION***)

Dim dataset As New DataSet
Dim datatable As DataTable
Dim dataadapter As New SqlDataAdapter
Dim sql As String

Dim i As Integer

'-- command
Dim cmd As New SqlCommand()
con.Open()

Using sw As New StreamWriter("G:\USER\TESTRUN1.txt")
    For Each word As String In result
        i = 0
        sql = ("SELECT * FROM Test_Table WHERE DigNum = @word")
        dataadapter = New SqlDataAdapter(sql, con)
        dataadapter.SelectCommand.Parameters.AddWithValue("@word", word)
        dataset = New DataSet()
        dataadapter.Fill(dataset, "Test_Table")

        While i < dataset.Tables("Test_Table").Rows.Count

            linePos = dataset.Tables("Test_Table").Rows(i).Item(4).ToString()
            MyFilePath = dataset.Tables("Test_Table").Rows(i).Item(1).ToString()
            i += 1
            Using sr As New StreamReader(MyFilePath)
                For n As Integer = 1 To linePos
                    lines = sr.ReadLine
                Next
                sw.WriteLine(lines)
            End Using

        End While


   Next

End Using
MsgBox("Complete!") 

Like I said this works fine, but even searching for just 5 numbers takes fairly long. I'm guessing the streamreader that slows it down, but I don't know.

For anyone asking, my database table has these columns:

DigNum | FilPth | FilDte | DteAdd | LnePos
K.Madden
  • 353
  • 1
  • 16
  • If the content of the file doesn't change (except if appended) then you should consider storing the position inside the files. It's much easier to say "read the line starting at position 23423" instead of "read the 423th line" – the_lotus Dec 12 '18 at 15:18
  • Sorry @the_lotus don't think I full quite understand – K.Madden Dec 12 '18 at 15:21
  • You need to call ReadLine until you get to the line you want. If you stored the position of that line (from the start of the file) you could just [Seek](https://learn.microsoft.com/en-us/dotnet/api/system.io.filestream.seek?view=netframework-4.7.2) to that position and start reading from there. – the_lotus Dec 12 '18 at 15:23
  • For reference, here is how the file is indexed: https://stackoverflow.com/questions/53520276/reading-large-text-file-very-slow – Joel Coehoorn Dec 12 '18 at 16:19
  • One thing to keep in mind here: writing a change back to the file invalidates the index I helped you create two weeks ago. You'll have to re-build the index every time you change the file. – Joel Coehoorn Dec 12 '18 at 16:28
  • See I’m not changing any the files I’m searching in. Im just trying to like find the lines then basically copy the full line and write it into a results.txt almost – K.Madden Dec 12 '18 at 17:34
  • @the_lotus and Joel Coehoorn thank you very much for the advice and help got it working it seems, Posted below. – K.Madden Dec 13 '18 at 13:19

2 Answers2

0

When trying to optimise performance in a piece of code like this you need to measure the performance on a call by call basis. You can do this using Visual Studio's profiling tools (see: https://learn.microsoft.com/en-us/visualstudio/profiling/profiling-feature-tour?view=vs-2017)

Once you have this data you will be able to directly see what is causing the performance issue rather than guessing. You will also have measurements so when you change your code you can see if the performance has improved or gotten worse. This again takes the guesswork out of the process, this is important as modern compilers and systems do a lot behind the scenes to improve performance but it's easy to derail these optimisations.

Stephen Turner
  • 7,125
  • 4
  • 51
  • 68
0

Hi guys so got one working and seems to work perfect almost instant. Was able to get it due to the help in comments!

Using sw As New StreamWriter("G:\USER\TESTRUN1.txt")
            For Each word As String In result
                i = 0
                sql = "SELECT * FROM Test_Table WHERE DigNbr = @word"
                dataadapter = New SqlDataAdapter(sql, con)
                dataadapter.SelectCommand.Parameters.AddWithValue("@word", word)
                dataset = New DataSet()
                dataadapter.Fill(dataset, "Test_Table")

                While i < dataset.Tables("Test_Table").Rows.Count

                    linePos = dataset.Tables("Test_Table").Rows(i).Item(4).ToString()
                    MyFilePath = dataset.Tables("Test_Table").Rows(i).Item(1).ToString()
                    i += 1


                    Using sr As New StreamReader(MyFilePath)
                        sr.BaseStream.Seek(4096 * (linePos - 1), SeekOrigin.Begin)
                        FoundWords.Add(sr.ReadLine)
                        For Each item As String In FoundWords
                            sw.WriteLine(item)

                        Next
                        FoundWords.Clear()
                    End Using

Basically for anyone reading I used the seek method, Every line is 4096 bytes so in other words a new line starts every 4097 so with my Line Number I have which indicates which line the 13 digit number I'm looking for is on I just simply multiple it so the seek goes straight to that byte(line) and gets the line.

e.g say number I was is on line 10, 4096 * 10 is 40960, which is where line 10 starts. Just like one the comments say, instead of search for line 55 I search basically for position (55*4096).

I'm quite bad at explaining but hope that helps someone.

K.Madden
  • 353
  • 1
  • 16