0

There are many hits on google about this but i'm wondering what is the best/fastest way to get some data out of a CSV file? There are some that load the entire CSV file in excel, some load it in an array. I've seen some people like to do search for a specific word.

Basically I need to retrieve 4 values out of each present CSV file. (start/end time, equipment and substrate) Note that the equipment will repeat itself multiple times inside every file. The other 3 are unique.

Which method is best/fastest?

Here's a small example of the CSV file:

/port_name   A
#data 01
  @slot_no        2
  @m_start 2020/03/26 19:15:27
  @m_end   2020/03/26 19:23:21
  @u_start  ????/??/?? ??:??:??
  @u_end    ????/??/?? ??:??:??
  $result 1 1 -4,-4 2548
    <result_info>    1 : Kind                 : 
    &no_of_image 3
    &i_name 01 S02.tif
     ~i_info    Digital_Zoom            1.0
     ~i_info    Equipment               4000 SERIAL NO. : 31
    &i_name 02 S02.tif
     ~i_info    Digital_Zoom            1.0
     ~i_info    Equipment               4000 SERIAL NO. : 31
~CMS_substrate_id      2 "8939-02"
/end_of_file

enter image description here

Wesley
  • 190
  • 9
  • That is definitely not a csv file. It doesn't conform to the usual CSV syntax. You may need to do custom processing to handle that. – geffchang Apr 05 '20 at 08:04
  • is it possible that it looks like this because I have it open in notepad? – Wesley Apr 05 '20 at 08:08
  • In this case you are retrieving only 4 values, and you are only reading from a range I do not see using arrays necessary. From my experience, arrays are a major advantage when read & writing to ranges, but retrieving 4 values should be as fast searching a range. – Dean Apr 05 '20 at 08:09
  • @Wesley I would also recommend posting an image of what it looks like in Excel, not in Notepad, as you will be doing your processing using Excel structure. – Dean Apr 05 '20 at 08:14
  • a CSV files typically looks like a 'Comma Separated Values'. like i.e.: `1,2,3` or `1,"two",3`. see https://en.wikipedia.org/wiki/Comma-separated_values – Luuk Apr 05 '20 at 08:14
  • 1
    Use regexp by vba. – Dy.Lee Apr 05 '20 at 08:15
  • @Dean If I open it in excel it looks the same. Every line is loaded only into colum A. I can deseparate all pieces in different columns but I don't think this is the way to go? – Wesley Apr 05 '20 at 08:19
  • @Dy.Lee thanks, I will look into it – Wesley Apr 05 '20 at 08:19
  • @Wesley can also you `Text To Columns` – Dean Apr 05 '20 at 08:20

3 Answers3

0

A quick start of a macro might look like this:

Sub readCSVfile()
    Dim textline As String
    Dim Filename
    Filename = "D:\TEMP\excel\61039635\CSVfile.txt"
    Dim row As Integer

    Cells(1, 1).Value = "m_start"
    Cells(1, 2).Value = "m_end"
    Cells(1, 3).Value = "Equipment"
    Cells(1, 4).Value = "CMS_substrate_id"
    row = 2
    Open Filename For Input As #1
    Do While Not EOF(1)
        Line Input #1, textline
        Select Case True
            Case InStr(textline, "@m_start") > 0:
                Cells(row, 1).Value = mysub(textline, "@m_start")
            Case InStr(textline, "@m_end") > 0:
                Cells(row, 2).Value = mysub(textline, "@m_end")
            Case InStr(textline, "Equipment") > 0:
                Cells(row, 3).Value = mysub(textline, "Equipment")
            Case InStr(textline, "CMS_substrate_id") > 0:
                Cells(row, 4).Value = mysub(textline, "CMS_substrate_id")
                row = row + 1
        End Select

    Loop

    Close (1)

End Sub
Function mysub(t As String, s As String) As String
    mysub = Trim(Mid(t, InStr(t, s) + Len(s) + 1))
End Function

enter image description here

Luuk
  • 12,245
  • 5
  • 22
  • 33
  • i saw a simular example when searching for a solution :) However I'm not use to work with the `for input` as well as `select case` If I copy you code it doesn't work for me. It copies almost all data in A2 – Wesley Apr 05 '20 at 09:52
  • Nevermind, it is probably conflicting with something else in my code. If I copy it to a blanc excel file it works – Wesley Apr 05 '20 at 09:58
  • I found why it is not working. It doesn't open the file as a text file.... If I save the file as textfile it works. How can I forse excel to open the file as a textfile? – Wesley Apr 05 '20 at 10:30
  • I have NO clue which file you are talking about, the file in my example is opened the default way that excel opens a file, see: https://learn.microsoft.com/en-us/office/vba/language/reference/user-interface-help/open-statement . – Luuk Apr 05 '20 at 12:37
  • Hey @Luuk, first time I come across ```Select Case True``` in programing languages. Is this structure only working in VB or others languages as well (C, python, etc.) ? – Vincent Apr 05 '20 at 13:42
  • And how is it any better than the typical if structure since testing expression doesn't depend on a single variable ? – Vincent Apr 05 '20 at 13:47
  • @Vincent: Its the basic use of how Select Case is defined, see also: https://stackoverflow.com/questions/794036/select-case-true , and the [docs](https://learn.microsoft.com/en-us/office/vba/language/reference/user-interface-help/select-case-statement), which say `Select Case testexpression .....`, why can a testexpression not simply be `true` – Luuk Apr 05 '20 at 14:37
  • @Luuk Select case is common, but not so much with ```True``` as testexpression ^^ and I found an answer for my second question (stackoverflow.com/questions/15144104/vb-net-how-give-best-performance-select-case-or-if-elseif-else-end-i) – Vincent Apr 05 '20 at 14:59
0

My answer is similar to @Luuk, but I'm not checking for "Equipment" as it appears in the sample data twice per record. Instead, I am checking for "&i_name 01" and then skipping down a few lines.

Sub sGetData()
    On Error GoTo E_Handle
    Dim strFile As String
    Dim intFile As Integer
    Dim strInput As String
    Dim lngRow As Long
    strFile = "J:\downloads\sample.txt"
    intFile = FreeFile
    Open strFile For Input As intFile
    lngRow = 1
    Do
        Line Input #intFile, strInput
        If InStr(strInput, "@m_start") > 0 Then
            lngRow = lngRow + 1
            ActiveSheet.Cells(lngRow, 1) = Mid(strInput, 12)
        ElseIf InStr(strInput, "@m_end") > 0 Then
            ActiveSheet.Cells(lngRow, 2) = Mid(strInput, 12)
        ElseIf InStr(strInput, "&i_name 01") > 0 Then
            Line Input #intFile, strInput
            Line Input #intFile, strInput
            ActiveSheet.Cells(lngRow, 3) = Mid(strInput, 41, 4)
        ElseIf InStr(strInput, "~CMS_substrate_id") > 0 Then
            ActiveSheet.Cells(lngRow, 4) = Mid(strInput, 24)
        End If
    Loop Until EOF(intFile)
sExit:
    On Error Resume Next
    Reset
    Exit Sub
E_Handle:
    MsgBox Err.Description & vbCrLf & vbCrLf & "sGetData", vbOKOnly + vbCritical, "Error: " & Err.Number
    Resume sExit
End Sub

As this data file is probably not line terminated with the normal Carriage Return/Line Feed combination that VBA deals with, I've created a new sub that reads the data into an array, split on the end of line character being used (in this case Line Feed) before processing it.

Sub sGetData2()
    On Error GoTo E_Handle
    Dim strFile As String
    Dim intFile As Integer
    Dim strInput As String
    Dim astrData() As String
    Dim lngLoop1 As Long
    Dim lngCount As Long
    Dim lngRow As Long
    strFile = "J:\downloads\sample1.txt"
    intFile = FreeFile
    Open strFile For Input As intFile
    strInput = input(LOF(intFile), intFile)
    astrData() = Split(strInput, vbLf)
    lngCount = UBound(astrData)
    lngRow = 1
    For lngLoop1 = 3 To lngCount
        If InStr(astrData(lngLoop1), "@m_start") > 0 Then
            lngRow = lngRow + 1
            ActiveSheet.Cells(lngRow, 1) = Mid(astrData(lngLoop1), 12)
        ElseIf InStr(astrData(lngLoop1), "@m_end") > 0 Then
            ActiveSheet.Cells(lngRow, 2) = Mid(astrData(lngLoop1), 12)
        ElseIf InStr(astrData(lngLoop1), "&i_name 01") > 0 Then
            lngLoop1 = lngLoop1 + 2
            ActiveSheet.Cells(lngRow, 3) = Mid(astrData(lngLoop1), 41, 4)
        ElseIf InStr(astrData(lngLoop1), "~CMS_substrate_id") > 0 Then
            ActiveSheet.Cells(lngRow, 4) = Mid(astrData(lngLoop1), 24)
        End If
    Next lngLoop1
sExit:
    On Error Resume Next
    Reset
    Exit Sub
E_Handle:
    MsgBox Err.Description & vbCrLf & vbCrLf & "sGetData2", vbOKOnly + vbCritical, "Error: " & Err.Number
    Resume sExit
End Sub

Regards,

Applecore
  • 3,934
  • 2
  • 9
  • 13
  • that actually realy smart to have the offset from i_name 01. But I wonder if this is the best way since it will read every line from the file. Is there a way to manipulate where it looks into the lines? the first 3 are in the beginning and the fourth is one of the last lines. Some textfiles have more 50 000 lines – Wesley Apr 05 '20 at 12:07
  • You may find it faster to open the file and read the whole lot as one (using "strInput=Input(LOF(intFile),intFile)"), using an array to hold the data split on a line by line basis ("astrData()=Split(strInput,vbCrLf)", and then looping the array from the 4th element. This would only work if the 4 pieces of data are always offset by the same amount. – Applecore Apr 05 '20 at 12:22
  • the offset is not the same amount. I'm trying the code on the original file but after the first line it skips to end of file. Even if I delete all data before the m_start so m_start is the first line it doesn't recognize it. If I run it on the small example that I put my questions it works. Very strange – Wesley Apr 05 '20 at 12:28
  • I suspect that your end of line marker is causing the problem, and the code is reading everything as one line. When you run the code, put a breakpoint in on "Line Input #intFile, strInput" and see what is being stored there. – Applecore Apr 05 '20 at 12:34
  • `strInp` returns the whole text file at once indeed – Wesley Apr 05 '20 at 12:43
0
  1. Get String from your text file by adodb.stream object.
  2. Extract what you are looking for from the imported string with regexp.
  3. Put the second contents of the submatches of the extracted match collection into an array. Equipment items have two identical contents, so they are increased by two.
  4. The data in the array is transferred to the sheet.

Sub Test()
    Dim Ws As Worksheet
    Dim Path As String
    Dim s As String
    Dim pattn(1 To 4) As String
    'Dim Match(1 To 4) As MatchCollection
    Dim Match(1 To 4) As Object
    Dim vR() As Variant
    Dim i As Long, n As Long, j As Integer, k As Long

    Path = ThisWorkbook.Path & "\regextest.txt" '<~~ Your text file full Path

    s = getString(Path) '<~~ get text form your text file

    Set Ws = ActiveSheet

    '** This is regular Expression
    pattn(1) = "(m_start[ ]{1,})(\d{4}/\d{2}/\d{2} \d{2}:\d{2}:\d{2})"
    pattn(2) = "(m_end[ ]{1,})(\d{4}/\d{2}/\d{2} \d{2}:\d{2}:\d{2})"
    pattn(3) = "(~i_info    Equipment[ ]{1,})(\d{1,})"
    pattn(4) = "(~CMS_substrate_id[ ]{1,})(\d{1,}[ ]{1,}" & Chr(34) & "\d{1,}-\d{1,}" & Chr(34) & ")"

    For i = 1 To 4
        Set Match(i) = GetRegEx(s, pattn(i))
    Next i

    n = Match(1).Count
    ReDim vR(1 To n, 1 To 4)

    For i = 0 To n - 1
        For j = 1 To 4
            If j = 3 Then
                vR(i + 1, j) = Match(j).Item(k).SubMatches(1)
                k = k + 2
            Else
                vR(i + 1, j) = Match(j).Item(i).SubMatches(1)
            End If
        Next j
    Next i
    With Ws
        .Cells.Clear
        .Range("a1").Resize(1, 4) = Array("m_start", "m_end", "Equipment", "CMS_substrate_id")
        .Range("a2").Resize(n, 4) = vR
        .Range("a:b").NumberFormatLocal = "yyyy/mm/dd hh:mm:ss"
    End With
End Sub

Function GetRegEx(StrInput As String, strPattern As String) As Object
    'Dim RegEx As New RegExp
    Dim RegEx As Object

    'Set RegEx = New RegExp
    Set RegEx = CreateObject("VBscript.RegExp")
    With RegEx
        .Global = True
        .IgnoreCase = False
        .MultiLine = True
        .Pattern = strPattern
    End With
    If RegEx.Test(StrInput) Then
        Set GetRegEx = RegEx.Execute(StrInput)
    End If
End Function

Function getString(Path As String)
    'Dim objStream As ADODB.Stream
    Dim objStream As Object

    'Set objStream = New ADODB.Stream
    Set objStream = CreateObject("ADODB.Stream")

    With objStream
        .Charset = "Utf-8"
        .Open
        .LoadFromFile Path
        getString = .ReadText
        .Close
    End With

End Function

Result image (3 types of data)

enter image description here

Dy.Lee
  • 7,527
  • 1
  • 12
  • 14