0

I have a folder with text files that contains Japanese data (utf-8 text file). I have to search a specific string in those files and find the match. once the match is found I have to copy complete line and its line number (single file can have multiple match). The issue is we can't read UTF-8 data using FSO hence to use ADO stream. I don't know how to read the text file line by line to use .readline and how to use search function like InStr(pos, s, FindThis, vbTextCompare) which we use with FSO

Sub TestR_utf_8()
    Dim st As ADODB.Stream
    Dim sPathname As String, sText As String
    
    sPathname = "c:\tmp\test_utf-8.txt"
    
    ' create a stream object
    Set st = New ADODB.Stream
    
    ' set properties
    st.Charset = "utf-8"
    st.Type = adTypeText
    
    ' open the stream object and load the text
    st.Open
    st.LoadFromFile (sPathname)
    
    ' read 10 characters
    sText = st.ReadText
    
    ' display the characters read
    MsgBox sText
    
    st.Close
    Set st = Nothing
End Sub
James Z
  • 12,209
  • 10
  • 24
  • 44
Gaurav
  • 1
  • 3
  • 1
    Can you not read the entire file into a String (as you're doing) then split the String into an array of Strings (using `Split` and a delimiter eg `vbNewLine`) then loop through the array using `Instr` to look for matches (note that `Instr` is a VBA function, nothing to do with FSO)? ... or is writing the code to do that the problem? – JohnM Jun 17 '23 at 12:05
  • Be aware MsgBox is not compatible with non-ANSI unicode characters. So please check if your problem is to read the text or to search/compare the sting. – Shrotter Jun 17 '23 at 12:35

1 Answers1

0

The Visual Basic Editor does not support Unicode characters. Therefore, try the following code, as suggested by @JohnM...

Sub TestR_utf_8()

    Dim st As ADODB.Stream
    Dim sPathname As String, sText As String
    Dim linesOfText() As String
    Dim i As Long
    
    sPathname = "c:\tmp\test_utf-8.txt"
    
    ' create a stream object
    Set st = New ADODB.Stream
    
    ' set properties
    st.Charset = "utf-8"
    st.Type = adTypeText
    
    ' open the stream object and load the text
    st.Open
    st.LoadFromFile sPathname
    
    ' read all characters from text stream
    sText = st.ReadText
    
    st.Close
    
    ' split into array of strings
    linesOfText() = Split(sText, vbNewLine)
    
    ' loop through the array of strings using InStr to find a match
    For i = LBound(linesOfText) To UBound(linesOfText)
        If InStr(1, linesOfText(i), ChrW(&H898B), vbTextCompare) > 0 Then
            'code to copy line, etc
            '
            '
        End If
    Next i
    
    Set st = Nothing
    
End Sub

Note that you'll need to replace the search term for InStr accordingly.

Domenic
  • 7,844
  • 2
  • 9
  • 17
  • Thank you very much @domenic , The above code worked like charm ! , Just have one query , what if the st.readtext have duplicate strings and I want multiple result – Gaurav Jun 19 '23 at 05:00
  • As the code loops through each element of the array containing a line of text, any line that matches the criteria will be copied. – Domenic Jun 19 '23 at 14:47