0

While reading a CSV file, how can I configure CsvHelper to enforce that each row has no extra columns that are not found in the header? I cannot find any obvious property under CsvConfiguration nor under CsvHelper.Configuration.Attributes.

Context: In our CSV file format, the last column is a string description, which our users (using plain-text editors) sometimes forget to quote when the description contains commas. Such "raw" commas cause that row to have extra columns, and the intended description read into the software omits the description after the first raw comma. I want to detect this and throw an exception that suggests to the user they may have forgotten to quote the description cell.

It looks like CsvConfiguration.DetectColumnCountChanges might be related, but presently the 29.0.0 library lacks any Intellisense description of CsvConfiguration properties, so I have no idea how to use this.

Similar information for other CSV libraries:

colbster
  • 3
  • 2
  • I'm not aware of an option available for that, you could pre-process your file and do a `.split("-")` and if the result array has more records than what your header has then you you throw the exception – Juan Oct 28 '22 at 05:52

1 Answers1

0

You were on the right track with CsvConfiguration.DetectColumnCountChanges.

void Main()
{
    var config = new CsvConfiguration(CultureInfo.InvariantCulture)
    {
        DetectColumnCountChanges = true
    };
    
    using (var reader = new StringReader("Id,Name\n1,MyName\n2,YourName,ExtraColumn"))
    using (var csv = new CsvReader(reader, config))
    {
        try
        {           
            var records = csv.GetRecords<Foo>().ToList();
        }
        catch (BadDataException ex)
        {
            if (ex.Message.StartsWith("An inconsistent number of columns has been detected."))
            {
                Console.WriteLine("There is an issue with an inconsistent number of columns on row {0}", ex.Context.Parser.RawRow);
                Console.WriteLine("Row data: \"{0}\"", ex.Context.Parser.RawRecord);
                Console.WriteLine("Please check for commas in a field that were not properly quoted.");
            } 
        }
        
    }
}

public class Foo
{
    public int Id { get; set; }
    public string Name { get; set; }
}

David Specht
  • 7,784
  • 1
  • 22
  • 30
  • Very helpful, thank you! May I ask how you knew this answer? Did I miss any documentation, or must one reverse engineer the implemented CsvHelper code to understand how to use this? – colbster Oct 28 '22 at 20:37
  • To clarify my confusion, the word `Detect` in `DetectColumnCountChanges` implies for me that a soft, neutral (FYI) reaction is requested, as if it were an `event` simply informing the caller of a detected change. Contrast that with stronger language used in properties such as `LineBreakInQuotedFieldIsBadData` and `AllowComments`. Does `DetectDelimiterValues` also throw because it also uses the word `Detect`? Without descriptions, the names alone leave too much room for misinterpretation. – colbster Oct 28 '22 at 20:47
  • I'll answer my own side question. I found it here: https://github.com/JoshClose/CsvHelper/blob/master/src/CsvHelper/Configuration/IReaderConfiguration.cs – colbster Oct 28 '22 at 22:19
  • @colbster I just set `DetectColumnCountChanges`, ran the sample and noticed that it threw a `BadDataException`. I also knew that `CsvHelper` tends to throw exceptions for things like this. But I would agree, better naming conventions would be helpful. – David Specht Oct 31 '22 at 12:18