4

I'm reading a csv file using the Lumenworks csv reader. Below is an example record

"001-0000265-003"|"Some detail"|"detal1"|"detail2"|"detal3"|"detail4"|"detail5"|"detail6"

I've created a class with below constructor to read this file

using (var input = new CsvReader(stream, true, '|'))
{
//logic to create an xml here
}

This works fine when there is no double quotes inside details. But when the scinarios like this

"001-0000265-003"|"Some " detail"|"detal1"|"detail2"|"detal3"|"detail4"|"detail5"|"detail6"

the reader throws an exception

An unhandled exception of type 'LumenWorks.Framework.IO.Csv.MalformedCsvException' occurred in LumenWorks.Framework.IO.dll

So then I used the CsvReader constructor which takes 7 arguments,

CsvReader(stream, true, '|', '"', '"', '#', LumenWorks.Framework.IO.Csv.ValueTrimmingOptions.All))

But still I'm getting the same error. Please provide any suggestions.

I'm reading some complex filed as follows,

"001-0000265-003"|"ABC 33"X23" CDE 32'X33" AAA, BB'C"|"detal1"|"detail2"|"detal3"|"detail4"|"detail5"|"detail6"
  • Why not try to read the csv file without this library ? – mybirthname Oct 15 '14 at 11:13
  • The reader is correct in saying it is an invalid format. @mybirthname: Why reinventing the wheel? – Tim Schmelter Oct 15 '14 at 11:14
  • @TimSchmelter yes it is correct, but when he reads the csv by himself he can fix the column names or the data. In this case the format will be correct. In other case he should fix the file. I don't know probably I'm not understanding the problem. Hope someone help him. – mybirthname Oct 15 '14 at 11:18
  • That is a malformed csv file. Should you be reading it? What is the purpose of all those quotes in the file? They seem like redundant separators. Use '|' as the only separator and ignore the quotes. Does the `CsvReader` have any property which allows quotes to be parsed as a normal character? If not, maybe you can delete them before parsing the file? You will still get malformed data (missing fields) but if thats not an issue...IMO you should bail out but you know your requirements better. – InBetween Oct 15 '14 at 11:20
  • You probably want to change the 4th and 5th parameters to the constructor to something other than a double quote so it will not treat double quotes as special characters. However I believe you would then have to handle the double quotes yourself. – juharr Oct 15 '14 at 11:28
  • @InBetween This is the requirement. I cannot change the input. I'll have to go with this :) – Isuru Siriwardana Oct 15 '14 at 11:30
  • I assume that the data will never contain a pipe character, otherwise this will be impossible to parse correctly since the double quote's purpose is to allow the delimiters to be part of the data, but an escape character is needed to allow for the double quote to also be part of the data. – juharr Oct 15 '14 at 11:34
  • @IsuruSiriwardana: maybe it would be simpler to not set the quoting-Character to `"` but just to use the pipe-delimiter. Then you can trim all `"` away from the fields by using `csv[i].Trim('"')`. – Tim Schmelter Oct 15 '14 at 12:14

1 Answers1

6

I've tested it with your sample data and it's pretty difficult to fix this malformed line(f.e. from the Catch-block). So i would not use a quoting-character, but instead just use the pipe-delimiter and remove the " later via csv[i].Trim('"').

Here's a method that parses the file and returns all lines' fields:

private static List<List<string>> GetAllLineFields(string fullPath)
{
    List<List<string>> allLineFields = new List<List<string>>();
    var fileInfo = new System.IO.FileInfo(fullPath);

    using (var reader = new System.IO.StreamReader(fileInfo.FullName, Encoding.Default))
    {
        Char quotingCharacter = '\0'; // no quoting-character;
        Char escapeCharacter = quotingCharacter;
        Char delimiter = '|';
        using (var csv = new CsvReader(reader, true, delimiter, quotingCharacter, escapeCharacter, '\0', ValueTrimmingOptions.All))
        {
            csv.DefaultParseErrorAction = ParseErrorAction.ThrowException;
            //csv.ParseError += csv_ParseError;  // if you want to handle it somewhere else
            csv.SkipEmptyLines = true;

            while (csv.ReadNextRecord())
            {
                List<string> fields = new List<string>(csv.FieldCount);
                for (int i = 0; i < csv.FieldCount; i++)
                {
                    try
                    {
                        string field = csv[i];
                        fields.Add(field.Trim('"'));
                    } catch (MalformedCsvException ex)
                    {
                        // log, should not be possible anymore
                        throw;
                    }
                }
                allLineFields.Add(fields);
            }
        }
    }
    return allLineFields;
}

Test and output with a file that contains your sample data:

List<List<string>> allLineFields = GetAllLineFields(@"C:\Temp\Test\CsvFile.csv");
    foreach (List<string> lineFields in allLineFields)
        Console.WriteLine(string.Join(",", lineFields.Select(s => string.Format("[{0}]", s))));

[001-0000265-003],[Some detail],[detal1],[detail2],[detal3],[detail4],[detail5],[detail6]
[001-0000265-003],[Some " detail],[detal1],[detail2],[detal3],[detail4],[detail5],[detail6]
Tim Schmelter
  • 450,073
  • 74
  • 686
  • 939
  • 2
    By this solution, when you have a delimiter inside a field e.g. `"foo|bar"` then `csv.ReadNextRecord()` throws an `MalformedCsvException`. How to escape the delimiter in these fields now? – Simon Nov 28 '16 at 12:54