I have a CSV document with the headers shown as below:
Date,Time,TIRCA-501 [°C],PIRCA-501 [MPa],TIRCA-502 [°C],TIRCA-503 [°C],TIR-504 [°C],WTRIA-501 [°C]
(The actual csv file is much more longer than that but I've just cut out the relevant part)
Here's the utility method I am using to parse the csv file:
public static bool TryReadFromCsvFile(string csvFilePath, out DataTable fileContent, bool isFirstRowHeader)
{
fileContent = new DataTable();
try
{
string header = isFirstRowHeader ? "Yes" : "No";
string pathOnly = Path.GetDirectoryName(csvFilePath);
string fileName = Path.GetFileName(csvFilePath);
string sql = @"SELECT * FROM [" + fileName + "] ";
using (OleDbConnection connection = new OleDbConnection(
String.Format("Provider=Microsoft.Jet.OLEDB.4.0;Data Source=\"{0}\";Extended Properties=\"Text;CharacterSet=65001;ImportMixedTypes=Text;IMEX=1;HDR={1};FMT=Delimited;TypeGuessRows=0\"",pathOnly,header)))
using (OleDbCommand command = new OleDbCommand(sql, connection))
using (OleDbDataAdapter adapter = new OleDbDataAdapter(command))
{
fileContent.Locale = CultureInfo.CurrentCulture;
adapter.Fill(fileContent);
return true;
}
}
catch (Exception ex)
{
//Logging utility here
return false;
}
}
The method generally works fine but for the above data, the square bracket '[' is getting replaced a regular bracket '(' in the end result of the parse.
Just to prove I haven't lost my sanity, here's proof (screenshots taken with debugger):
I've also checked the hex code of the problematic square bracket in the original file. It's 5B, which is clearly denoted as left square bracket in UTF-8.
Why does OLEDB import cause this? How can I prevent this behavior?
Edit: I realize there exist many other ways of parsing CSV files. Heck, I can even read the content as list of strings and split by commas. I'm just trying to understand why Oledb causes an issue like this so I can decide whether to scrap the utility method all-together or not. I'd like to see an answer with an authoritative source on this.