3

Is there any way in the U-SQL extract statement to only specify the input columns that I care about? I'm working with a legacy database that exports several tables to csv that has about 200 columns. I only care about 10 of those fields. I was hoping to do something like:

EXTRACT CustomerID:0   string,
    StoreNumber:5    double,
    ReceiptNumber:20   double,
    Timestamp:125   int
FROM "somefile.csv"
USING Extractors.Csv();

So the idea would be to specify an index that corresponds to which column the data comes from.

Is there anything like that in the works? Or do I always need to define all 200 columns? I'm open to other solutions as well. Thanks!

Dan
  • 3,583
  • 1
  • 23
  • 18

2 Answers2

3

You can write your custom extractor code. Use the sample extractors at https://github.com/Azure/usql/tree/master/Examples/DataFormats/Microsoft.Analytics.Samples.Formats as guidance.

  • 1
    Although it's not possible to do what I described above, it's easy enough through a custom extractor as you suggested. Thanks! – Dan Dec 27 '16 at 15:07
  • Haven't the solution still not introduced in U-SQL as an in-built feature instead of custom extractor? – Vikash Pareek Oct 04 '18 at 15:17
1

There isn't an easy way to do this as you describe, but it would not take long to script. For example I often use Excel to help with scripting, eg paste the known metadata for the 200 column csv then map it to U-SQL data types, eg

Excel U-SQL helper

You could create an initial U-SQL view that had all the columns using the method above, then create another view based on the initial view for the columns you want:

CREATE VIEW IF NOT EXISTS dbo.view1
    AS
EXTRACT col1 int,
        col2 string,
        col3 string,
        col4 string,
        col5 string
FROM "/input/input44.txt"
USING Extractors.Csv();


CREATE VIEW IF NOT EXISTS dbo.view2
    AS
SELECT  col1,
        col2
FROM dbo.view1;


@input =
    SELECT *
    FROM dbo.view2;

I also sometimes use Visual Studio Find and Replace with RegEx to do similar things.

wBob
  • 13,710
  • 3
  • 20
  • 37