3

I need to populate a rowset with all the dates between a defined Start date and End Date. If my start date is 19/7/2017 and the end date is 21/7/2017 then the rowset should contain 19/7/2017, 20/7/2017 and 21/7/2017.

I was wondering if there was an easy way to do this using U-SQL

4 Answers4

5

The easiest way to do this would be to export your favourite date dimension from your favourite warehouse and import it into a U-SQL table.

You could also do this using custom U-SQL code, something like this:

DECLARE @outputFilepath string = "output/output74.csv";

//DECLARE @startDate DateTime = DateTime.Parse("19/7/2017");
//DECLARE @endDate DateTime = DateTime.Parse("21/7/2017");

DECLARE @startDate DateTime = DateTime.Parse("1/1/2000");
DECLARE @endDate DateTime = DateTime.Parse("31/12/2017");


// User-defined appliers
// Take one row and produce 0 to n rows
// Used with OUTER/CROSS APPLY
@output =
    SELECT outputDate
    FROM(
        VALUES ( 1 ) 
        ) AS dummy(x)
        CROSS APPLY new USQLtpch.makeDateRange (@startDate, @endDate) AS properties(outputDate DateTime);


OUTPUT @output
TO @outputFilepath
USING Outputters.Tsv();

The code-behind file:

using Microsoft.Analytics.Interfaces;
using Microsoft.Analytics.Types.Sql;
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;

namespace USQLtpch
{

    [SqlUserDefinedApplier]
    public class makeDateRange : IApplier
    {
        private DateTime startDate;
        private DateTime endDate;

        public makeDateRange(DateTime startDate, DateTime endDate)
        {
            this.startDate = startDate;
            this.endDate = endDate;
        }

        public override IEnumerable<IRow> Apply(IRow input, IUpdatableRow output)
        {

            // Initialise
            DateTime outputDate = this.startDate;


            // Loop until date range has been filled out
            while (outputDate <= endDate)
            {
                output.Set<DateTime>("outputDate", outputDate);

                // Increment date
                outputDate = outputDate.AddDays(1);

                yield return output.AsReadOnly();

            }
        }
    }
}

I've done this using a custom Applier which takes 1 row and converts it to 0 or n.

wBob
  • 13,710
  • 3
  • 20
  • 37
4

We always recommend that developers investigate using the pure U-SQL approach first instead of using C# UDOs, here is another way of accomplishing this task.

First, consider how you would just get a list of numbers in U-SQL

@numbers_10 = 
    SELECT
        *
    FROM 
    (VALUES
        (0),
        (1),
        (2),
        (3),
        (4),
        (5),
        (6),
        (7),
        (8),
        (9)
    ) AS T(Value);

That's just 10 numbers - 0 to 9. We can use CROSS JOIN to expand the list.

@numbers_100 = 
    SELECT (a.Value*10 + b.Value) AS Value
    FROM @numbers_10 AS a 
        CROSS JOIN @numbers_10 AS b;

Now we have 0 to 99. We can use CROSS JOIN to generate even more numbers.

@numbers_10000 = 
    SELECT (a.Value*100 + b.Value) AS Value
    FROM @numbers_100 AS a CROSS JOIN @numbers_100 AS b;

Then generate a list of dates from that.

DECLARE @StartDate = DateTime.Parse("1979-03-31");

...

@result = 
    SELECT 
        Value,
        @StartDate.AddDays( Value ) AS Date
    FROM @numbers_10000;

The full script looks like this:

DECLARE @StartDate = DateTime.Parse("1979-03-31");

@numbers_10 = 
    SELECT
        *
    FROM 
    (VALUES
        (0),
        (1),
        (2),
        (3),
        (4),
        (5),
        (6),
        (7),
        (8),
        (9)
    ) AS T(Value);

@numbers_100 = 
    SELECT (a.Value*10 + b.Value) AS Value
    FROM @numbers_10 AS a CROSS JOIN @numbers_10 AS b;

@numbers_10000 = 
    SELECT (a.Value*100 + b.Value) AS Value
    FROM @numbers_100 AS a CROSS JOIN @numbers_100 AS b;

@result = 
    SELECT 
        Value,
        @StartDate.AddDays( Value ) AS Date
    FROM @numbers_10000;

OUTPUT @result TO "/res.csv" USING Outputters.Csv(outputHeader:true);

Once you have your list of numbers or dates, it might be convenient to save it into a U-SQL table so you can retrieve the list easily later.

saveenr
  • 8,439
  • 3
  • 19
  • 20
  • I always recommend that too! : ) – wBob Jul 25 '17 at 14:38
  • Hey @saveenr would you like me to delete my post as it doesn't represent best practice? As mentioned, I think the best way to do this is import your favourite date dimension, or even query it where it lives using federated queries. – wBob Jul 26 '17 at 10:49
  • @wbob - I think your post is great; please don't remove it. Developers should know about this practical application of an Applier UDO. It also makes for a very useful starting sample to build other Appliers. – saveenr Jul 30 '17 at 22:33
  • In my experience this runs slower than a `CROSS APPLY EXPLODE` to a `.net` `Enumerable.Range` as per my answer below. – iamdave Jan 10 '19 at 10:29
1

Step 1: You need to have a deterministic ordering of the rows in the row set for this to logically make sense. So figure out which column you want to order your rows by

Step 2: Get a row number assigned to each row. Here is an example of how https://msdn.microsoft.com/en-us/library/azure/mt763822.aspx

Step 3: You can use the row number that is assigned to each row combined with a C# expression to generate the date that should be for each row.

Amit Kulkarni
  • 910
  • 4
  • 11
1

This is a prime example of where the .Net elements of the U-SQL language can be used to great effect. In this case you can explode an Enumerable.Range to get an incrementing list of values that you can apply to your data:

DECLARE @startDate DateTime = DateTime.Parse("2000/01/01");
DECLARE @endDate DateTime = DateTime.Parse("2017/12/31");

@dates =
    SELECT d.DateValue
    FROM (VALUES(@startDate)) AS sd(s)
         CROSS APPLY    // EXPLODE creates a rowset from all the values in the given list
             EXPLODE(Enumerable.Range(0
                                     ,(@endDate - @startDate).Days
                                     )
                                     .Select(offset => sd.s.AddDays(offset))
                    ) AS d(DateValue)
    ;
iamdave
  • 12,023
  • 3
  • 24
  • 53