4

I'm trying to create a USQL job and defined my columns from the CSVs they will be retrieved from, however I'm always having issues on the JOIN portion, because the columns I am matching are of a different type. This is weird because I have defined them to be the same type. See screenshot of where the issue lies:

enter image description here

Here is the complete USQL:

@guestCheck = 
    EXTRACT GuestCheckID int,
            POSCheckGUID Guid,
            POSCheckNumber int?,
            OwnerEmployeeID int,
            CreatedDateTime DateTime?,
            ClosedDateTime DateTime?,
            TicketReference string,
            CheckAmount decimal?,
            POSTerminalID int,
            CheckState string,
            LocationID int?,
            TableID int?,
            Covers int?,
            PostedDateTime DateTime?,
            OrderChannelID int?,
            MealPeriodID int?,
            RVCLocationID int?,
            ReopenedTerminalID int?,
            ReopenedEmployeeID int?,
            ReopenedDateTime DateTime?,
            ClosedBusDate int?,
            PostedBusDate int?,
            BusHour byte?,
            TaxExempt bool?,
            TaxExemptReference string
    FROM "/GuestCheck/GuestCheck-incomplete.csv"
    USING Extractors.Csv();

@guestCheckAncillaryAmount =
    EXTRACT CheckAncillaryAmountID int,
            GuestCheckID int,
            GuestCheckItemID int?,
            AncillaryAmountTypeID int,
            Amount decimal,
            FirstDetail int?,
            LastDetail int?,
            IsReturn bool?,
            ReturnReasonID int?,
            AncillaryReasonID int?,
            AncillaryNote string,
            ClosedBusDate int?,
            PostedBusDate int?,
            BusHour byte?,
            LocationID int?,
            RVCLocationID int?,
            IsDelisted bool?,
            Exempted bool?
    FROM "/GuestCheck/GuestCheckAncillaryAmount.csv"
    USING Extractors.Csv();

@ancillaryAmountType = 
    EXTRACT AncillaryAmountTypeID int,
            AncillaryAmountCategoryID int,
            CustomerID int,
            CheckTitle string,
            ReportTitle string,
            Percentage decimal,
            FixedAmount decimal,
            IncludeOnCheck bool,
            AutoCalculate bool,
            StoreAtCheckLevel bool?,
            DateTimeModified DateTime?,
            CheckTitleToken Guid?,
            ReportTitleToken Guid?,
            DeletedFlag bool,
            MaxUsageQty int?,
            ApplyToBasePriceOnly bool?,
            Exclusive bool,
            IsItem bool,
            MinValue decimal,
            MaxValue decimal,
            ItemGroupID int?,
            LocationID int,
            ApplicationOrder int?,
            RequiresReason bool,
            Exemptable bool?
    FROM "/GuestCheck/AncillaryAmountType.csv"
    USING Extractors.Csv();

@read =
    SELECT t.POSCheckGUID,
           t.POSCheckNumber,
           t.CheckAmount,
           aat.AncillaryAmountTypeID,
           aat.CheckTitle,
           gcd.Amount
    FROM @guestCheck AS t         
         LEFT JOIN
             @guestCheckAncillaryAmount AS gcd
         ON t.GuestCheckID == gcd.GuestCheckID
         LEFT JOIN
             @ancillaryAmountType AS aat
         ON gcd.AncillaryAmountTypeID == aat.AncillaryAmountTypeID
    WHERE aat.AncillaryAmountCategoryID IN(2, 4, 8);

OUTPUT @read
TO "/GuestCheckOutput/output.csv"
USING Outputters.Csv();
GregGalloway
  • 11,355
  • 3
  • 16
  • 47
AnimaSola
  • 7,146
  • 14
  • 43
  • 62

2 Answers2

3

Indeed, U-SQL is strongly typed, and int and int? are different types. You would need to cast in an intermediate rowset:

@ancillaryAmountType2 =
SELECT (int?) aat.AncillaryAmountTypeID AS AncillaryAmountTypeID,
       aat.AncillaryAmountCategoryID,
       aat.CheckTitle
FROM @ancillaryAmountType AS aat;

Or, better, use dimensional modeling best practice, and avoid nullable "dimensions" for the reasons stated in http://blog.chrisadamson.com/2013/01/avoid-null-in-dimensions.html.

  • Appreciate the response Alexandre. However, AncillaryAmountTypeID in both guestCheckAncillaryAmount and ancillaryAmountType are defined as int. So why is it saying that one is int? when I never declared it to be that way. – AnimaSola Dec 01 '16 at 16:37
3

This is not to do with the nullability of the columns as specified in the EXTRACT table definition, because as the OP has shown in their code, neither of the join columns are specified as null (ie with ?) in the EXTRACT definition. This is do with the multiple outer joins and what is known as a null-supplying table.

If you think about it logically, imagine you have three tables, TableA had 3 records, TableB has two records and TableC has one record, something like this:

Tables

If you start with tableA and left outer join to tableB you instinctively know you will get three records but the column x will be null for tableB column x; this is your null-supplying table and where the nullability is coming from.

Thankfully the fix is the same; change the nullability of the column earlier on or specify substitued values, eg -1.

@t3 =
    SELECT (int?) x AS x, 2 AS a
    FROM dbo.tmpC;

// OR

// Use conditional operator to supply substitute values
@t3 =
    SELECT x == null ? -1 : x AS x, 2 AS a
    FROM dbo.tmpC;

However there is another problem with your particular query. In most relational databases, adding a WHERE clause to a table on the right-hand side of a left outer join converts the join to an inner join and it's the same in U-SQL. You might want to think about the real result you are trying to get and consider rewriting your query.

HTH

wBob
  • 13,710
  • 3
  • 20
  • 37