3

I'm trying to extract some data from a CSV file using the following U-SQL EXTRACT statement:

EXTRACT SessionId   string,
        Latitude    double,
        Longitude   double,
        Timestamp   int
FROM "wasb://sessions@myaccount.blob.core.windows.net/"
USING Extractors.Csv();

But my job is failing halfway through because there is a row that doesn't fit this schema (common in huge datasets) because it has the wrong number of columns. How do I avoid that this fails the entire extract?

outside2344
  • 2,075
  • 2
  • 29
  • 52

2 Answers2

7

Note that the silent flag will do the following:

  1. Ignore rows that have mismatched column counts
  2. replace invalid values with null if the column type is nullable.

It will still error if:

  1. the value cannot be cast to the expected not-nullable type.
  2. there is an invalid character for the specified encoding.
Michael Rys
  • 6,684
  • 15
  • 23
  • Is there a mechanism you can use to ignore rows with the "still error" conditions above? – outside2344 Jun 07 '16 at 20:25
  • You would have to write your own custom extractor. If you have specific scenarios, feel free to also add a feature request on the built-in extractors at http://aka.ms/adlfeedback. – Michael Rys Jun 07 '16 at 21:03
5

Use the slient:true parameter to Extractors.Csv() ala:

EXTRACT SessionId   string,
        Latitude    double,
        Longitude   double,
        Timestamp   int
FROM "wasb://sessions@myaccount.blob.core.windows.net/"
USING Extractors.Csv(silent:true);
outside2344
  • 2,075
  • 2
  • 29
  • 52