Perhaps your pattern scan is inefficient. I can scan for a pattern in a 7 MB file in about 1/20th of a second using code like this. Note, if you really want to use code like this, you have to make a correction. You can't always set MatchedLength back to 0 when you realize that you aren't looking at a match, but it does work for this particular pattern. You have to pre-process the pattern so you know what to reset to when you don't find a match, but that will not add significant time to the algorithm. I could make the effort to correctly complete the algorithm, but I won't do that now if your question is just about performance. I'm just demonstrating that it is possible to scan large files quickly if you do it correctly.
Sub Main(ByVal args As String())
If args.Length < 1 Then Return
Dim startTime As Long = Stopwatch.GetTimestamp()
Dim pattern As Byte()
pattern = System.Text.Encoding.UTF8.GetBytes("SFMB")
Dim bufferSize As Integer = 4096
Using reader As New System.IO.FileStream(args(0), IO.FileMode.Open, _
Security.AccessControl.FileSystemRights.Read, IO.FileShare.Read, bufferSize, IO.FileOptions.SequentialScan)
Dim buffer(bufferSize - 1) As Byte
Dim readLength = reader.Read(buffer, 0, bufferSize)
Dim matchedLength As Integer = 0
Dim searchPos As Integer = 0
Dim fileOffset As Integer = 0
Do While readLength > 0
For searchPos = 0 To readLength - 1
If pattern(matchedLength) = buffer(searchPos) Then
matchedLength += 1
Else
matchedLength = 0
End If
If matchedLength = pattern.Length Then
Console.WriteLine("Found pattern at position {0}", fileOffset + searchPos - matchedLength + 1)
matchedLength = 0
End If
Next
fileOffset += readLength
readLength = reader.Read(buffer, 0, bufferSize)
Loop
End Using
Dim endTime As Long = Stopwatch.GetTimestamp()
Console.WriteLine("Search took {0} seconds", (endTime - startTime) / Stopwatch.Frequency)
End Sub
EDIT
Here are some thoughts about how you could match multiple patterns at once. This is just off the top of my head and I have not tried to compile the code:
Create a class to contain information about the status of a pattern:
Class PatternInfo
Public pattern As Byte()
Public matchedBytes As integer
End Class
Declare a variable to track all the patterns that you need to check and index them by the first byte of the pattern for quick lookup:
Dim patternIndex As Dictionary(Of Byte, IEnumerable(Of PatternInfo))
Check all the patterns that are currently a potential match to see if the next byte also matches on these patterns; if not, stop looking at that pattern at that position:
Dim activePatterns As New LinkedList(Of PatternInfo)
Dim newPatterns As IEnumerable(Of PatternInfo)
For Each activePattern in activePatterns.ToArray
If activePattern.pattern(matchedBytes) = buffer(searchPos) Then
activePattern.matchedBytes += 1
If activePattern.matchedBytes >= activePattern.pattern.Length Then
Console.WriteLine("Found pattern at position {0}", searchPos - matchedBytes + 1)
End If
Else
activePatterns.Remove(activePattern)
End If
Next
See if the current byte looks like the beginning of a new pattern that you would be searching for; if so, add it to the list of active patterns:
If patternIndex.TryGetValue(buffer(searchPos), newPatterns) Then
For Each newPattern in newPatterns
activePatterns.Add(New PatternInfo() With { _
.pattern = newPattern.pattern, .matchedBytes = 1 }
Next
End If