1

I have a CSV document with the headers shown as below:

Date,Time,TIRCA-501 [°C],PIRCA-501 [MPa],TIRCA-502 [°C],TIRCA-503 [°C],TIR-504 [°C],WTRIA-501 [°C]

(The actual csv file is much more longer than that but I've just cut out the relevant part)

Here's the utility method I am using to parse the csv file:

public static bool TryReadFromCsvFile(string csvFilePath, out DataTable fileContent, bool isFirstRowHeader)
{
    fileContent = new DataTable();
    try
    {
        string header = isFirstRowHeader ? "Yes" : "No";

        string pathOnly = Path.GetDirectoryName(csvFilePath);
        string fileName = Path.GetFileName(csvFilePath);

        string sql = @"SELECT * FROM [" + fileName + "] ";

        using (OleDbConnection connection = new OleDbConnection(
            String.Format("Provider=Microsoft.Jet.OLEDB.4.0;Data Source=\"{0}\";Extended Properties=\"Text;CharacterSet=65001;ImportMixedTypes=Text;IMEX=1;HDR={1};FMT=Delimited;TypeGuessRows=0\"",pathOnly,header)))
        using (OleDbCommand command = new OleDbCommand(sql, connection))
        using (OleDbDataAdapter adapter = new OleDbDataAdapter(command))
        {
            fileContent.Locale = CultureInfo.CurrentCulture;
            adapter.Fill(fileContent);
            return true;
        }
    }
    catch (Exception ex)
    {
        //Logging utility here
        return false;
    }
}

The method generally works fine but for the above data, the square bracket '[' is getting replaced a regular bracket '(' in the end result of the parse.

Just to prove I haven't lost my sanity, here's proof (screenshots taken with debugger):

enter image description here

enter image description here

I've also checked the hex code of the problematic square bracket in the original file. It's 5B, which is clearly denoted as left square bracket in UTF-8.

Why does OLEDB import cause this? How can I prevent this behavior?

Edit: I realize there exist many other ways of parsing CSV files. Heck, I can even read the content as list of strings and split by commas. I'm just trying to understand why Oledb causes an issue like this so I can decide whether to scrap the utility method all-together or not. I'd like to see an answer with an authoritative source on this.

TtT23
  • 6,876
  • 34
  • 103
  • 174
  • Probably because square brackets are used for long field name escaping in OleDb ( `[this is a long field name]` ). Try using `TextFieldParser` instead of OleDb. – Sam Axe Jul 14 '15 at 03:04

1 Answers1

0

I am a personal fan of Microsoft.VisualBasic.FileIO.TextFieldParser for csv parsing. Add a reference to Microsoft.VisualBasic. I have saved your header as a ANSI encoded csv.

string dataCsv;     
using (var csvReader = new TextFieldParser(
    dataCsv, 
    Encoding.GetEncoding("iso-8859-1"), 
    true))
{
    csvReader.TextFieldType = FieldType.Delimited;
    csvReader.SetDelimiters(",");

    while (!csvReader.EndOfData)
    {
        try
        {
            string[] currentRow = csvReader.ReadFields();
            // turn that to a DataRow
        }
        catch (MalformedLineException ex) { }
    }
    // build the DataTable and add all DataRows 
}
andrei.ciprian
  • 2,895
  • 1
  • 19
  • 29