2

I have a csv file that is comma delimited with a Lf terminator and I want to convert this to a two dimensional array for further processing. I successfully read the file into a string and modified the line terminator from Lf to Cr as I understand vba does not recognize Lf as a terminator. I want to take this string and create a two dimensional array where I can also determine the number of rows and columns as the original csv file can be of arbitrary size. The csv file can also be large (>300 MB and >1M elements)

I haven't been able to figure out how to determine the number of columns even though there is a CR terminator in the string for each data line. The code below shows how I created the string which is comma delimited with Cr as the terminator

Open fName For Binary As #1

     Buf$ = String$(LOF(1), 0)
     Get 1, , Buf$
     Buf$ = Replace$(Buf$, vbLf, vbCr) ' Replace LF with CR

Close #1

I would like a method or function that will create a two dimensional array and calculate the number of rows and columns

GSD
  • 1,252
  • 1
  • 10
  • 12
XGeek
  • 21
  • 3
  • I really have no idea what kind of execution speeds you'll get, but there is the String.Split function. I'd think you could split your string on the CR to yield one array, and then iterate over that array and split each element by commas and thus build out your multidimensional array. – Ryan B. Apr 19 '19 at 14:11
  • 1
    If your field values could contain commas then you will have more work to do. Have you tried just opening the file directly in Excel? That would be the most straightforward approach. – Tim Williams Apr 19 '19 at 15:14
  • I can try the String.Split function and measure the performance which I was also concerned about so did not initially go down that route. I'm trying to avoid reading or writing any values into cells as I know that is very slow and I'm working with large arrays of close to a million elements. – XGeek Apr 19 '19 at 16:41
  • @XGeek have you tried using replacing with a non-standard symbol (e.g., µ) and using texttocolumns using that symbol as the delimiter? `.TextToColumns Destination:=Range("A1"), DataType:=xlDelimited, Other:=True, OtherChar:="µ"` using that symbol – Cyril Apr 19 '19 at 17:44

2 Answers2

1

The first thing coming to mind is using InStr() which displays the location of the character in a string. Define your array dimensions beforehand (last row of your dataset); I will use i as iterator fro teh loop in the brief examples)

sep = InStr(Cells(i,1).Value,"BS") 'swap BS for whatever you want to separate on

Then use the separate (sep) to either array

arr(i,1) = left(Cells(i,1).value,sep-1)
arr(i,2) = right(Cells(i,1).value,len(cells(i,1))-sep+2) '2 characters in sep "b" and "s" so adding that back

The change I would suggest to your code is to use some symbol or string rather than a carriage return so it is easily separated.

Cyril
  • 6,448
  • 1
  • 18
  • 31
  • I am trying to parse a string variable that was created from a large csv file that is delimited with commas and has Cr as a line terminator. I could create a different line terminator if that would help. This file is intended to create an array with around 300 columns and 100's of thousands of rows in the end. I'm not sure how I could use InStr as there are many comma delimeters and upfront I do not know the size of the array that is needed as it will be variable – XGeek Apr 19 '19 at 16:56
  • @XGeek you can dynamically determine the columns/rows (see [this](https://www.rondebruin.nl/win/s9/win005.htm)). With that, you can create an array (stored within VBA so it will allow much faster itterating) using the dynamic variables. – Cyril Apr 19 '19 at 17:32
1

I would attempt this by:

  1. First splitting the data into rows using vbNewLine.
  2. Then loop your split array, and split each iteration using your comma delimiter. Leaving you with a jagged array (an array containing arrays).
  3. Lastly, you now have the dimensions for the two dim array, ReDim and loop over it to add all your data.

For abstraction purposes, I separated this task into three separate functions.


Main function to Call

I made the delimiter optional so it's not restrictive to only commas.

This does not account for escaped characters, you'll have to add that if that is needed.

Private Function TextFileToArray(ByVal FilePath As String, Optional ByVal Delimiter As String = ",") As Variant

    'READ THE CONTENTS FROM TEXTFILE
    Dim FileContent As String
    FileContent = ReadTextFile(FilePath)

    'SEPERATE THE ROWS USING vbNewLine
    Dim SplitData As Variant
    SplitData = Split(FileContent, vbNewLine)

    'CREATE A JAGGED ARRAY BY SPLITTING EACH STRING
    Dim JaggedArray As Variant
    ReDim JaggedArray(LBound(SplitData, 1) To UBound(SplitData, 1))

    Dim Index As Long
    For Index = LBound(SplitData, 1) To UBound(SplitData, 1)
        JaggedArray(Index) = Split(SplitData(Index), Delimiter)
    Next Index

    'CONVERT JAGGED ARRAY INTO A TWO DIM ARRAY
    TextFileToArray = JaggedArrayToTwoDimArray(JaggedArray)

End Function

Read the content of the text file

This could be written in the main function, but it is normally best to try to separate your code into smaller blocks.

Public Function ReadTextFile(ByVal FilePath As String) As String

    Dim Fso As Object
    Set Fso = CreateObject("Scripting.FileSystemObject")

    Dim Ts As Object
    Set Ts = Fso.OpenTextFile(FilePath, 1, False)

    ReadTextFile = Ts.ReadAll

End Function

Convert Jagged Array into a 2d Array

I've stored the bounds in variables to make it easier to debug and read.

Private Function JaggedArrayToTwoDimArray(ByVal SourceArray As Variant) As Variant

    'CAPTURE BOUNDS
    Dim LB1 As Long
    LB1 = LBound(SourceArray, 1)

    Dim UB1 As Long
    UB1 = UBound(SourceArray, 1)

    Dim LB2 As Long
    LB2 = LBound(SourceArray(LB1), 1)

    Dim UB2
    UB2 = UBound(SourceArray(UB1), 1)

    'SET BOUNDS OF RETURN ARRAY
    Dim ReturnArray As Variant
    ReDim ReturnArray(LB1 To UB1, LB2 To UB2)

    'POPULATE TWO DIM ARRAY FROM JAGGED ARRAY
    Dim RowIndex As Long
    For RowIndex = LB1 To UB1

        Dim ColIndex As Long
        For ColIndex = LB2 To UB2
            ReturnArray(RowIndex, ColIndex) = SourceArray(RowIndex)(ColIndex)
        Next ColIndex

    Next RowIndex

    JaggedArrayToTwoDimArray = ReturnArray

End Function

Please feel free to add error handling and any changes that might be needed.

Robert Todar
  • 2,085
  • 2
  • 11
  • 31
  • Will the Split function using vbNewLine as the delimiter work with a LF terminator only or does it need a CR? I'm not very familiar with Jagged Arrays so why are they helpful in this case? – XGeek Apr 19 '19 at 17:24
  • VbNewLine is a constant that should work with either LF or CR, so you shouldn't have to change it. A jagged array is simply an array that contains arrays; example: Array(Array(0, 1), Array(0, 2), Array(0, 3)). The advantage is that we can split each row of text into its own array, separating it by commas. – Robert Todar Apr 19 '19 at 18:09
  • 1
    The code you proposed is largely working with 2 exceptions: the Split function failed as vbNewLine was ignored resulting in all data getting mapped to the first array element. I switched the delimiter to vbLf, which is the terminator in the file, then it worked, mapping each data row to a unique array element. The creation of the two dimensional array also failed but that was because my csv source file has a blank last row which caused UB2 to go negative which is an easy fix. Thinking about this, can I just access the data from the JaggedArray and avoid creating the 2D array? – XGeek Apr 19 '19 at 22:18
  • Interesting that VbNewLine wasn't working, I'm glad you got it to work though. I found [this](https://stackoverflow.com/questions/43876586/why-doesnt-vba-replace-function-work-with-crlf-in-word-and-excel) stack overflow answer that shows what is going on with line feeds. Just curious, what operating system are you using? And yes, you could simply access the Jagged Array in this example if that works for you. – Robert Todar Apr 22 '19 at 14:56
  • I am using Windows but the csv file comes from an Android app which only has LF terminators. I believe that VbNewLine did not work for me as it is equivalent to Chr(13) + Chr(10). Your jagged array code worked great but I was struggling with performance so I ended up just using a string array created with the Split function and selecting the data in the comma delimited string I needed on the fly vs. trying to create an array upfront which avoided the long wait time. You put me on the right path so thank you very much! – XGeek Apr 23 '19 at 17:51