I'm creating a Windows form application that allows the user to specify a text file as a data source, dynamically creates the form controls based on the number of columns in the file, and allows the user to input search parameters which will be used to search the file when a search button is clicked. Any results will be written to a new text file.
The files that will be searched by this program are often quite large (up to 12 GB). My current search method (read a line, search it, add it to the results file if it's a hit) works perfectly well for reasonably sized files (a few MBs or so). With my "large" test file (~2.5 GB), it takes about 12 minutes to search the file.
So my question is: Would what would be the best way to improve performance? After much searching and reading, I know that I have the following options:
- Async methods
- Tasks
- TPL dataflow
- Some combination of these methodologies
Since the logic of my program is more a stream, I'm leaning towards dataflow, but I'm unsure as to how to implement it properly or if there may be a better solution. Below is the code for the clickEvent of the search button and functions associated with the search.
'Searches the loaded file
Private Sub searchBtn_Click(sender As Object, e As EventArgs) Handles searchBtn.Click
Dim strFileName As String
Dim didWork As Integer
Dim searchHits As Integer
Dim watch As Stopwatch = Stopwatch.StartNew()
'Prompts user to enter title of file to be created
exportFD.Title = "Save as. . ."
exportFD.Filter = "Text Files(*.txt)|*.txt" 'Limits user to only saving as .txt file
exportFD.ShowDialog()
If didWork = DialogResult.Cancel Then 'Handles if Cancel Button is clicked
Return
Else
strFileName = exportFD.FileName
Dim writer As New IO.StreamWriter(strFileName, False)
Dim reader As New IO.StreamReader(filepath)
Dim currentLine As String
'Skip first line of SOURCE text file for search, but use it to write column headers to file
currentLine = reader.ReadLine()
Dim columnLine = currentLine.Split(vbTab)
'First: Insert column names into NEW text file
For col As Integer = 0 To colCount - 1
writer.Write(columnLine(col) & vbTab)
Next
writer.Write(vbNewLine)
'Search whole file, line by line
Do While reader.Peek() > 0
'next line
currentLine = reader.ReadLine()
'new function:
If validChromosome(currentLine) Then
writer.WriteLine(currentLine)
searchHits += 1
End If
Loop
'Close out writer and reader and tell user file was saved
writer.Close()
reader.Close()
searchTxtB.Text = searchHits.ToString()
watch.Stop()
MsgBox("Searched in: " + watch.Elapsed.ToString() + " and saved to: " + strFileName)
End If
End Sub
'This function searches through the current line and checks if it follows what the user has searched for
Private Function validChromosome(chromString As String) As Boolean
'Split line by delimiter
Dim readRow() As String = Split(chromString, vbTab)
validChromosome = True 'Start off as true
Dim rowLength As Integer = readRow.Length - 1
'Iterate through string tokens and compare
For token As Integer = 0 To rowLength
Try
Dim currentGroupBox As GroupBox = criteriaPanel.Controls.Item(token)
Dim checkedParameter As CheckBox = currentGroupBox.Controls("CheckBox")
'User wants to search this parameter
If checkedParameter.Checked = True Then
Dim numericRadio As RadioButton = currentGroupBox.Controls("NumericRadio")
'Searching by number
If numericRadio.Checked = True Then
Dim value As Decimal
Dim lowerBox As NumericUpDown = currentGroupBox.Controls("NumericBoxLower")
Dim upperBox As NumericUpDown = currentGroupBox.Controls("NumericBoxUpper")
Dim lowerInclusiveCheck As CheckBox = currentGroupBox.Controls("NumericInclusiveLowerCheckBox")
Dim upperInclusiveCheck As CheckBox = currentGroupBox.Controls("NumericInclusiveUpperCheckBox")
'Try to convert the text to a decimal.
If Not Decimal.TryParse(readRow(token), value) Then
validChromosome = False
Exit For
End If
'Not within the given range user inputted for numeric search
If Not withinRange(value, lowerBox.Value, upperBox.Value, lowerInclusiveCheck.Checked, upperInclusiveCheck.Checked) Then
validChromosome = False
Exit For
End If
Else 'Searching by text
Dim textBox As TextBox = currentGroupBox.Controls("TextBox")
'If the comparison failed, then this chromosome is not valid. Break out of loop and return false.
If Not [String].Equals(readRow(token), textBox.Text.ToString(), StringComparison.OrdinalIgnoreCase) Then
validChromosome = False
Exit For
End If
End If
End If
Catch ex As Exception
'Simple error checking.
MsgBox(ex.ToString)
validChromosome = False
Exit For
End Try
Next
End Function
'Function to check if value safely in betweeen two values
Private Function withinRange(value As Decimal, lower As Decimal, upper As Decimal, inclusiveLower As Boolean, inclusiveUpper As Boolean) As Boolean
withinRange = False
Dim lowerCheck As Boolean = False
Dim upperCheck As Boolean = False
If inclusiveLower Then
lowerCheck = value >= lower
Else
lowerCheck = value > lower
End If
If inclusiveUpper Then
upperCheck = value <= upper
Else
upperCheck = value < upper
End If
withinRange = lowerCheck And upperCheck
End Function
My current theory is that I should create a TransformBlock that will contain my file read method and create a small buffer (~10 lines) which would be passed to another TransformBlock that searches them and puts the results in a list, which would then by passed to another TransformBlock to be written to the export file.
It is quite likely that my search function (validChromosome) is probably not very great, so any suggestions for improvements there would also be welcome. This is my first program, and I know that VB.net likely isn't the best language for text file manipulation, but I'm being forced to use it. Thanks in advance for any help, and please let me know if any more information is needed.