5

Is there any way to use LinqToCSV and select only specific columns?

For example, I need to ingest a CSV file every day that might have 14 columns one month and maybe 15 the month after. At the moment I've configured it to map all 14 columns but this really isn't ideal because there's only 10 that I truly care about.

Because of this, when an extra column is thrown in I get a TooManyDataFieldsException thrown and LinqToCSV won't read any lines of the CSV file.

casperOne
  • 73,706
  • 19
  • 184
  • 253
Matt Dell
  • 9,205
  • 11
  • 41
  • 58
  • Do the columns always have the same names, although sometimes some are missing? – Gert Arnold Sep 11 '12 at 21:20
  • Did you finally solve it? I faced the same issue in the past, and I modified the source code and added some useful properties to the `CsvFileDescription` class (like `ReadOnlySpecifiedColumns`, `Append`, etc.) – Oscar Mederos Feb 15 '13 at 14:02

4 Answers4

4

see here: http://www.codeproject.com/Articles/25133/LINQ-to-CSV-library#EnforceCsvColumnAttribute

When true, Read only reads data fields into public fields and properties with the [CsvColumn] attribute, ignoring all other fields and properties. And, Write only writes the contents of public fields and properties with the [CsvColumn] attribute.

avs099
  • 10,937
  • 6
  • 60
  • 110
1

It seems that IgnoreUnknownColumns property does the job,

Here the code I use:

    /// <summary>
    /// The input file without header.
    /// </summary>
    private readonly CsvFileDescription inputFileWithoutHeader = new CsvFileDescription
    {
        SeparatorChar = ',',
        FirstLineHasColumnNames = false,
        EnforceCsvColumnAttribute = true,
        IgnoreUnknownColumns = true
    };

    /// <summary>
    /// The input file with headers.
    /// </summary>
    private readonly CsvFileDescription inputFileWithHeaders = new CsvFileDescription
    {
        SeparatorChar = ',',
        FirstLineHasColumnNames = true,
        EnforceCsvColumnAttribute = false,
        IgnoreUnknownColumns = true
    };

    /// <summary>
    /// The list items.
    /// </summary>
    /// <returns>
    /// The <see>
    ///         <cref>IEnumerable</cref>
    ///     </see>
    ///     .
    /// </returns>
    public IEnumerable<ListItem> ListItems()
    {
        return
            Directory.EnumerateFileSystemEntries(this.path, "ListItem*.csv")
                .SelectMany(chkLstFile => this.csvContext.Read<ListItem>(chkLstFile, this.inputFileWithoutHeader)).Distinct();
    }

Then I retrieve my data from my repository:

var myItems = myClassInstance.ListItems().CatchExceptions(ex => Debug.WriteLine(ex.Message));

For more control I have an extension method to handle errors inspired from: Wrap an IEnumerable and catch exceptions

    public static IEnumerable<T> CatchExceptions<T>(this IEnumerable<T> src, Action<Exception> action = null)
    {
        using (var enumerator = src.GetEnumerator())
        {
            var next = true;

            while (next)
            {
                try
                {
                    next = enumerator.MoveNext();
                }
                catch (AggregatedException ex)
                {
                    lock (ex)
                    {
                        foreach (var e in ex.m_InnerExceptionsList)
                        {
                            if (action != null)
                            {
                                action(e);
                            }

                            File.AppendAllText(LogFilePath, string.Format("{0}: {1}\r\n", DateTime.Now.ToShortTimeString(), e.Message)); //todo ILogger
                        }
                    }

                    File.AppendAllText(LogFilePath, "-\r\n");
                    continue;
                }
                catch (Exception ex)
                {
                    if (action != null)
                    {
                        action(ex);
                    }

                    lock (ex)
                    {
                        File.AppendAllText(LogFilePath, string.Format("{0}: {1}\r\n", DateTime.Now.ToShortTimeString(), ex.Message)); //todo ILogger
                    }

                    continue;
                }

                if (next)
                {
                    yield return enumerator.Current;
                }
            }
        }
    }
Community
  • 1
  • 1
Jean F.
  • 1,775
  • 1
  • 19
  • 16
0

Try implementing the IDataRow interface -- see "Reading Raw Data Rows"

cordialgerm
  • 8,403
  • 5
  • 31
  • 47
0

You need IgnoreUnknownColumns http://www.codeproject.com/Articles/25133/LINQ-to-CSV-library#IgnoreUnknownColumns