2

I'm looking for the equivalent databricks code for the query. I added some sample code and the expected as well, but in particular I'm looking for the equivalent code in Databricks for the query. For the moment I'm stuck on the CROSS APPLY STRING SPLIT part.

Sample SQL data:

    CREATE TABLE FactTurnover
    (
    ID INT,
    SalesPriceExcl NUMERIC (9,4),
  Discount VARCHAR(100)
  )
 INSERT INTO FactTurnover
 VALUES 
   (1, 100, '10'),
   (2, 39.5877, '58, 12'),
   (3, 100, '50, 10, 15'),
   (4, 100, 'B')

Query:

    ;WITH CTE AS
    (
     SELECT Id, SalesPriceExcl, 
         CASE WHEN value = 'B' THEN 0
         ELSE CAST(value as int) END AS Discount
     From FactTurnover
     CROSS APPLY STRING_SPLIT(Discount, ',')
     )
     SELECT Id,  
       Min(SalesPriceExcl) AS SalesPriceExcludingDiscount,
       EXP(SUM(LOG((100 - Discount) / 100.0))) As TotalDiscount,
       Cast(EXP(SUM(LOG((100 - Discount) / 100.0))) * 
            MIN(SalesPriceExcl) As Numeric(9,2))
        PriceAfterDiscount
     FROM CTE
     GROUP BY ID

Expected Results:

| Id | SalesPriceExcludingDiscount |       TotalDiscount | PriceAfterDiscount |

|----|-----------------------------|---------------------|--------------------|

|  1 |                         100 |                 0.9 |                 90 |

|  2 |                     39.5877 | 0.36960000000000004 |              14.63 |

|  3 |                         100 | 0.38250000000000006 |              38.25 |

|  4 |                         100 |                   1 |                100 |
  • What's the issue? Databricks supports SQL, right? – kjmerf Sep 18 '19 at 14:20
  • yes but not every command is recognized if you use it in Databricks. When you start the command with %sql, it does not recognize all of the functions. Cross apply is one of them. – Yanni Pattas Sep 18 '19 at 14:25
  • @kjmerf I think only [Spark SQL](https://docs.databricks.com/spark/latest/spark-sql/index.html), so you can't copy paste other sql dialects in there – PHPirate Sep 18 '19 at 14:28
  • Yes indeed. That's it. Its the Spark equivalent I'm looking for. – Yanni Pattas Sep 18 '19 at 14:40
  • You'll need to use split and explode. Something like `discounts.withColumn("SalesPriceExcl",explode(split($"SalesPriceExcl",",")))`. That will give you a row per value in the array, which should basically replace the cross apply – Andrew Sep 18 '19 at 14:42
  • How many values can be in the discount string? – kjmerf Sep 18 '19 at 15:38

1 Answers1

2

Use SPLIT to convert the comma-separated string to an array then use LATERAL VIEW and EXPLODE to do operations on the elements of that array. The roughly equivalent syntax (including CTEs) is:

%sql
--SELECT * FROM FactTurnover;

WITH cte AS
(
SELECT *
FROM
  (
  SELECT Id, SalesPriceExcl, SPLIT ( Discount, ',' ) AS discountArray
  FROM FactTurnover
  ) x
  LATERAL VIEW EXPLODE ( discountArray ) x AS xdiscount
)
SELECT 
  Id,
  MIN(SalesPriceExcl) AS SalesPriceExcludingDiscount,
  EXP ( SUM( LOG( ( 100 - xdiscount ) / 100.00 ) ) ) AS TotalDiscount
FROM cte
GROUP BY Id
ORDER BY Id

If you are feeling brave, you could also do this using higher order functions. I've included two examples below. I would say these are harder to debug and you should probably try them performance-wise, it depends what you're comfortable with:

%sql
-- Convert Discount text column to array with SPLIT function and filter out value 'B' from the array
;WITH filterB AS (
SELECT *, FILTER ( SPLIT ( Discount, ',' ), x -> x != 'B' ) discountArray
FROM FactTurnover
), cte1 AS (
-- Do initial calcs on array
SELECT 
  Id,
  TRANSFORM ( discountArray, discountArray -> LOG( ( 100 - discountArray ) / 100.00 ) ) discountArray2
FROM filterB
)
SELECT
  Id,
  EXP( AGGREGATE ( discountArray2, CAST( 0 AS DOUBLE ), ( x, y ) -> x + y ) ) AS x
FROM cte1;

-- all in one example
SELECT 
    Id,
    EXP( AGGREGATE( TRANSFORM( FILTER ( SPLIT ( Discount, ',' ), x -> x != 'B' ), y -> LOG( ( 100 - y ) / 100.00 ) ), CAST( 0 AS DOUBLE ), ( z, a ) -> z + a ) )
    AS final
FROM FactTurnover
ORDER BY Id
wBob
  • 13,710
  • 3
  • 20
  • 37