4

So after doing some research I was able to find the format I need to get the CSV File into

Subject,Start Date,Start Time,End Date,End Time,All Day Event,Description,Location,Private

The issue is, the CSV export I'm working with is not in the correct format or order, what is the best way to go about gathering that information? Here is a bit of my source.

Name,User Name,Row Type,Start Date,Start Time,End Time,End Date,Segment Start Date,Type

"Smith, John J",jjs,Shift,5/29/2011,9:30,17:30,5/29/2011,5/29/2011,Regular

"Smith, John J",jjs,Shift,5/30/2011,13:30,17:30,5/30/2011,5/30/2011,Regular

    Dim Name As String = ""
    Dim UserName As String = ""

    Dim Data As String = """Smith, John J"",jj802b,Shift,5/29/2011,9:30,17:30,5/29/2011,5/29/2011,Transfer"

    For r As Integer = 1 To 10
        Name = Data.Substring(0, Data.LastIndexOf(""""))
        Data = Data.Remove(0, Data.LastIndexOf(""""))
        UserName = Data.Substring(Data.LastIndexOf(""""), ",")
    Next
Landmine
  • 1,759
  • 6
  • 39
  • 60
  • 1
    Your input data does not seem to have all the fields that the Google format requires. Where are you intending to get the missing data from? –  May 28 '11 at 09:33
  • Be careful when exporting the data, because your output has to be acceptable. For instance, If the name that you read was John "Smarty" Pants, you might need to generate an output as ""John ""Smartie"" Pants"", ... . – luiscolorado Jun 03 '11 at 20:26

3 Answers3

3

Following is the solution

Dim Name As String = ""
Dim UserName As String = ""

Dim Data As String = """Smith, John J"",jj802b,Shift,5/29/2011,9:30,17:30,5/29/2011,5/29/2011,Transfer"

For r As Integer = 1 To 10
    Dim DataArr() As String = DecodeCSV(Data) 'Use DecodeCSV function to regex split the string 
    Name = DataArr(0) 'Get First item of array as Name
    UserName = DataArr(1)  'Get Second item of array as UserName 
Next

Great Code for DecodeCSV by Tim

Public Shared Function DecodeCSV(ByVal strLine As String) As String()

    Dim strPattern As String
    Dim objMatch As Match

    ' build a pattern
    strPattern = "^" ' anchor to start of the string
    strPattern += "(?:""(?<value>(?:""""|[^""\f\r])*)""|(?<value>[^,\f\r""]*))"
    strPattern += "(?:,(?:[ \t]*""(?<value>(?:""""|[^""\f\r])*)""|(?<value>[^,\f\r""]*)))*"
    strPattern += "$" ' anchor to the end of the string

    ' get the match
    objMatch = Regex.Match(strLine, strPattern)

    ' if RegEx match was ok
    If objMatch.Success Then
        Dim objGroup As Group = objMatch.Groups("value")
        Dim intCount As Integer = objGroup.Captures.Count
        Dim arrOutput(intCount - 1) As String

        ' transfer data to array
        For i As Integer = 0 To intCount - 1
            Dim objCapture As Capture = objGroup.Captures.Item(i)
            arrOutput(i) = objCapture.Value

            ' replace double-escaped quotes
            arrOutput(i) = arrOutput(i).Replace("""""", """")
        Next

        ' return the array
        Return arrOutput
    Else
        Throw New ApplicationException("Bad CSV line: " & strLine)
    End If

End Function
Sachin Chavan
  • 5,578
  • 5
  • 49
  • 75
2

Depending on the exact content and guarantee of format of a CSV file, for speed and ease, sometimes using split on the , is the easiest and fastest way to parse the file. Your Name col includes a , that is not a delimiter, which adds a little bit of complication though it is still trivial to handle that case assuming the name always contains 1 ,.

There are libraries to parse CSV files, which can be useful. Assuming you don't need to handle all files that conform to the CSV spec I feel that they are overkill. With all that said you can use the following regular expression to easily parse the CSV file with named groups for convince:

"(?<Name>[^"]+?)",(?<UserName>[^,]+?),(?<RowType>[^,]+?),(?<StartDate>[^,]+?),(?<StartTime>[^,]+?),(?<EndTime>[^,]+?),(?<EndDate>[^,]+?),(?<SegmentStartDate>[^,]+?),(?<Type>\w+)

That will create named capture groups that you can then use to output to your new CSV file like so:

Dim ResultList As StringCollection = New StringCollection()
Try
    Dim RegexObj As New Regex("""(?<Name>[^""]+?)"",(?<UserName>[^,]+?),(?<RowType>[^,]+?),(?<StartDate>[^,]+?),(?<StartTime>[^,]+?),(?<EndTime>[^,]+?),(?<EndDate>[^,]+?),(?<SegmentStartDate>[^,]+?),(?<Type>\w+)", RegexOptions.IgnoreCase)
    Dim MatchResult As Match = RegexObj.Match(SubjectString)
    While MatchResult.Success
        'Append to new CSV file - MatchResult.Groups("groupname").Value

        'Name = MatchResult.Groups("Name").Value
        'Start Time = MatchResult.Groups("StartTime").Value         
        'End Time = MatchResult.Groups("EndTime").Value
        'Etc...
    End While
Catch ex As ArgumentException
    'Syntax error in the regular expression
End Try

See .NET Framework Regular Expressions on MSDN for more information.

daalbert
  • 1,465
  • 9
  • 7
2

A few things I'd like to note:

  • One is that I'm using a TextFieldParser, which you can find under the FileIO namespace, to work with the input CSV. This makes reading delimited files a lot easier than trying to deal with regular expressions and your own parsing, etc.
  • The other is that to store the data sets I am using a List(Of Dictionary(Of String, String)), or a list of dictionaries that relate strings to other strings. Essentially this is not much different from the access pattern of a DataTable and if you are more comfortable with that construct, you're welcome to use it instead. The list of dictionaries behaved exactly the same and required a lot less setup so it is used here in its stead.

I admit some of this is hard-coded but if you need to generalize the procedure, you can move certain aspects out to application settings and/or better decompose the function. The point here was to give you a general idea. The code is commented inline below:

    ' Create a text parser object
    Dim theParser As New FileIO.TextFieldParser("C:\Path\To\theInput.csv")

    ' Specify that fields are delimited by commas
    theParser.Delimiters = {","}

    ' Specify that strings containing the delimiter are wrapped by quotes
    theParser.HasFieldsEnclosedInQuotes = True

    ' Dimension containers for the field names and the list of data rows
    ' Initialize the field names with the first row r
    Dim theInputFields As String() = theParser.ReadFields(),
        theInputRows As New List(Of Dictionary(Of String, String))()

    ' While there is data to parse
    Do While Not theParser.EndOfData

        ' Dimension a counter and a row container
        Dim i As Integer = 0,
            theRow As New Dictionary(Of String, String)()

        ' For each field
        For Each value In theParser.ReadFields()

            ' Associate the value of that field for the row
            theRow(theInputFields(i)) = value

            ' Increment the count
            i += 1
        Next

        ' Add the row to the list
        theInputRows.Add(theRow)
    Loop

    ' Close the input file for reading
    theParser.Close()

    ' Dimension the list of output field names and a container for the list of formatted output rows
    Dim theOutputFields As New List(Of String) From {"Subject", "Start Date", "Start Time", "End Date", "End Time", "All Day Event", "Description", "Location", "Private"},
        theOutputRows As New List(Of Dictionary(Of String, String))()

    ' For each data row we've extracted from the CSV
    For Each theRow In theInputRows

        ' Dimension a new formatted row for the output
        Dim thisRow As New Dictionary(Of String, String)()

        ' For each field name of the output rows
        For Each theField In theOutputFields

            ' Dimension a container for the value of this field
            Dim theValue As String = String.Empty

            ' Specify ways to get the value of the field based on its name
            ' These are just examples; choose your own method for formatting the output
            Select Case theField

                Case "Subject"
                    ' Output a subject "[Row Type]: [Name]"
                    theValue = theRow("Row Type") & ": " & theRow("Name")

                Case "Description"
                    ' Output a description from the input field [Type]
                    theValue = theRow("Type")

                Case "Start Date", "Start Time", "End Date", "End Time"
                    ' Output the value of the field with a correlated name
                    theValue = theRow(theField)

                Case "All Day Event", "Private"
                    ' Output False by default (you might want to change the case for Private
                    theValue = "False"

                Case "Location"
                    ' Can probably be safely left empty unless you'd like a default value
            End Select

            ' Relate the value we've created to the column in this row
            thisRow(theField) = theValue
        Next

        ' Add the formatted row to the output data
        theOutputRows.Add(thisRow)
    Next

    ' Start building the first line by retriving the name of the first output field
    Dim theHeader As String = theOutputFields.First

    ' For each of the remaining output fields
    For Each theField In (From s In theOutputFields Skip 1)

        ' Append a comma and then the field name
        theHeader = theHeader & "," & theField
    Next

    ' Create a string builder to store the text for the output file, initialized with the header line and a line break
    Dim theOutput As New System.Text.StringBuilder(theHeader & vbNewLine)

    ' For each row in the formatted output rows
    For Each theRow In theOutputRows

        ' Dimension a container for this line of the file, beginning with the value of the column associated with the first output field
        Dim theLine As String = theRow(theOutputFields.First)

        ' Wrap the first value if necessary
        If theLine.Contains(",") Then theLine = """" & theLine & """"

        ' For each remaining output field
        For Each theField In (From s In theOutputFields Skip 1)

            ' Dereference and store the associated column value
            Dim theValue As String = theRow(theField)

            ' Add a comma and the value to the line, wrapped in quotations as needed
            theLine = theLine & "," & If(theValue.Contains(","), """" & theValue & """", theValue)
        Next

        ' Append the line to the output string
        theOutput.AppendLine(theLine)
    Next

    ' Write the formatted output to file
    IO.File.WriteAllText("C:\output.csv", theOutput.ToString)

For what it's worth, using your sample data seemed to result in the output file opening just fine in OpenOffice.org Calc using this code. The format of what you wish to output for the fields is up to you, so modify the appropriate Case statement in the Select to do so, and happy coding!

Jonathan Hall
  • 75,165
  • 16
  • 143
  • 189
lsuarez
  • 4,952
  • 1
  • 29
  • 51