0

I am trying to find an example of how to read a csv file using linq. My problem is that the examples I have found so far, the csv file is stored on the local machine and I am pulling the csv file from azure. Here is the example I found so far:

var stuff = from l in File.ReadLines(filename)
                            let x = l.Split(new[] { ',', ' ' }, StringSplitOptions.RemoveEmptyEntries)
                                     .Skip(1)
                                     .Select(s => int.Parse(s))
                            select new
                            {
                                Sum = x.Sum(),
                                Average = x.Average()
                            };

The problem is in my pull from Azure, I have to use DownloadToStream and move the file to a MemoryStream. When I have to work with MemoryStream, what should replace "File.ReadLines(filename)"?

user1790300
  • 2,143
  • 10
  • 54
  • 123
  • 1
    Unless you have control over the code generating the CSV, you probably want a dedicated CSV lib for the parsing step. There are a lot more gotchas than you would think when it comes to CSV's – Jason Watkins May 22 '15 at 22:25
  • 1
    Take a look at [TextFieldParser](https://msdn.microsoft.com/en-us/library/microsoft.visualbasic.fileio.textfieldparser%28v=vs.110%29.aspx?f=255&MSPPError=-2147217396) - it can work on streams and handle CSV files. It's in the `Microsoft.VisualBasic.FileIO` namespace (and yes, it can be used by C# despite it's name). See [TextFieldPaser Constructor (Stream)](https://msdn.microsoft.com/en-us/library/ms128082(v=vs.110).aspx) as well. – Tim May 22 '15 at 22:28
  • For the TextFieldParser, can it also confirm the file is indeed a csv as opposed to a regular text file and any other pitfalls that might come up? If not, could someone recommend a good csv library that could address these concerns and provides good performance? – user1790300 May 22 '15 at 23:47
  • @user1790300 - What do you mean confirm the file is indeed a CSV file? Based on what criteria? – Tim May 23 '15 at 00:11
  • @Tim, Is there a way to confirm the format is indeed csv? – user1790300 May 24 '15 at 16:49
  • @user1790300 - I ask again, how do you confirm it"s a CSV file? *What is the criteria?* A CSV file is a text file that has rows (lines) of data with the fields separated by commas. I'm not sure what exactly you're looking for. – Tim May 24 '15 at 17:19

1 Answers1

0

Once you have the data in a Stream, there are a lot of libraries that make reading CSV pretty easy. You'll want to avoid using Split(), because CSV is a bit more complicated than that and is easy to get wrong.

One library to do this with is the Ctl.Data NuGet package:

class MyPoco
{
    // CSV file must have a header with these property names.
    public int Foo { get; set; }
    public string Bar { get; set; }
    public DateTime Baz { get; set; }

    public static IEnumerable<MyPoco> Read(CloudBlockBlob blob)
    {
        using(Stream s = blob.OpenRead())
        using(StreamReader sr = new StreamReader(s))
        {
            foreach(MyPoco x in Ctl.Data.Formats.Csv.ReadObjects<MyPoco>(sr))
            {
                yield return x;
            }
        }
    }
}

(caveat: I'm the author of this package)

Cory Nelson
  • 29,236
  • 5
  • 72
  • 110
  • For your CSV libaray, can it also confirm the file is indeed a csv as opposed to a regular text file and any other pitfalls that might come up? – user1790300 May 22 '15 at 23:48
  • The library will throw an exception if the input is not valid CSV, and beyond just the format you can use annotation-based property validation to check that the input is within expected parameters. – Cory Nelson May 23 '15 at 00:14