I have a "container" containing data. The size is +- 100MB. In the container there a several "dataids's" that mark the begin of something.
Now I need to get an index for an given dataid. (dataid for example: '4CFE7197-0029-006B-1AD4-000000000012')
I have tried several approaches. But at this moment "ReadAllBytes" is the most performant.
ReadAll -> average of 0.6 seconds
Using oReader As New BinaryReader(File.Open(sContainerPath, FileMode.Open, FileAccess.Read))
Dim iLength As Integer = CInt(oReader.BaseStream.Length)
Dim oValue As Byte() = Nothing
oValue = oReader.ReadBytes(iLength)
Dim enc As New System.Text.ASCIIEncoding
Dim sFileContent As String = enc.GetString(oValue)
Dim r As Regex = New Regex(sDataId)
Dim lPosArcID As Integer = r.Match(sFileContent).Index
If lPosArcID > 0 Then
Return lPosArcID
End If
End Using
ReadByteByByte -> average of 1.4 seconds
Using oReader As BinaryReader = New BinaryReader(File.Open(sContainerPath, FileMode.Open, FileAccess.Read))
Dim valueSearch As StringSearch = New StringSearch(sDataId)
Dim readByte As Byte
While (InlineAssignHelper(readByte, oReader.ReadByte()) >= 0)
index += 1
If valueSearch.Found(readByte) Then
Return index - iDataIdLength
End If
End While
End Using
Public Class StringSearch
Private ReadOnly oValue() As Byte
Private iValueIndex As Integer = -1
Public Sub New(value As String)
Dim oEncoding As New System.Text.ASCIIEncoding
Me.oValue = oEncoding.GetBytes(value)
End Sub
Public Function Found(oNextByte As Byte) As Boolean
If oValue(iValueIndex + 1) = oNextByte Then
iValueIndex += 1
If iValueIndex + 1 = oValue.Count Then Return True
Else
iValueIndex = -1
End If
Return False
End Function
End Class
Public Function InlineAssignHelper(Of T)(ByRef target As T, ByVal value As T) As T
target = value
Return value
End Function
I find it hard to believe that there is no faster way. 0.6 seconds for a 100MB file is not an acceptable time.
An other approach that I tried, is to split in chuncks of X bytes (100, 1000, ..). But was alot slower.
Any help on an approach I can try?