I am trying to join data from two tables by nearest earliest date. The first tables dates are more frequent than the seconds dates. I came across a few links and it seems like a Lateral Join may provide what I need:
- https://jonmce.medium.com/what-the-heck-is-a-lateral-join-anyway-4c3345b94a63
- SQL: join with an equal key and nearest key (similar to Pandas's merge as of)
The first table, (Table A) and is the correct dimension i.e number of rows.
date | code | multiplier |
---|---|---|
2018 - 01 -01 | c1 | .3 |
2018 - 01 -01 | c2 | .5 |
2018 - 01 -01 | c3 | .2 |
2018 - 06 -01 | c1 | .4 |
2018 - 06 -01 | c1 | .4 |
2018 - 06 -01 | c3 | .2 |
2019 - 01 -01 | c1 | 1 |
2019 - 06 -01 | c1 | .5 |
2019 - 06 -01 | c2 | .5 |
2020 - 01 -01 | c1 | .3 |
2020 - 01 -01 | c2 | .5 |
2020 - 01 -01 | c3 | .2 |
However, the second table needs to be pivoted before it can be mapped on the first table.
date | Item | cost |
---|---|---|
2018 - 01 -01 | A | 1 |
2018 - 01 -01 | B | 2 |
2018 - 01 -01 | C | 2 |
2019 - 01 -01 | A | 2 |
2019 - 01 -01 | B | 3 |
2019 - 01 -01 | C | 4 |
2020 - 01 -01 | A | 4 |
2020 - 01 -01 | B | 4 |
2020 - 01 -01 | C | 5 |
My attempt at using LATERAL JOIN...
SELECT
A.*, C.*
FROM (
SELECT
`j`.`id`,
`j`.`date`,
`k`.`key`,
`k`.`value`
FROM `j`
INNER JOIN `k` ON `j`.`id` = `k`.`id`
WHERE `j`.`guid` = '<filter>'
) AS A
LEFT JOIN LATERAL (
SELECT `B`.*
FROM (
SELECT
'rate'.`date`,
MIN( CASE WHEN `rate`.`item` = 'itemA' THEN `rate`.`cost` END) 'itemA',
MIN( CASE WHEN `rate`.`item` = 'itemB' THEN `rate`.`cost` END) 'itemB',
MIN( CASE WHEN `rate`.`item` = 'itemC' THEN `rate`.`cost` END) 'itemC'
FROM `rate`
GROUP BY `rate`.`date`) AS B
WHERE `B`.`date` <= `A`.`date`
ORDER BY `B`.`date` DESC LIMIT 1) AS `C` ON true
);
The result should just have the values of the pivoted tables new columns with some duplication. But I am getting a syntax error
Schema Error: Error: ER_PARSE_ERROR: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '( SELECT B
.* FROM ( SELECT 'rate'.date
, MIN'
date | code | multiplier | A | B | C | date2 |
---|---|---|---|---|---|---|
2018 - 01 -01 | c1 | .3 | 1 | 2 | 2 | 2018-01-01 |
2018 - 01 -01 | c2 | .5 | 1 | 2 | 2 | 2018-01-01 |
2018 - 01 -01 | c3 | .2 | 1 | 2 | 2 | 2018-01-01 |
2018 - 06 -01 | c1 | .4 | 1 | 2 | 2 | 2018-01-01 |
2018 - 06 -01 | c2 | .4 | 1 | 2 | 2 | 2018-01-01 |
2018 - 06 -01 | c3 | .2 | 1 | 2 | 2 | 2018-01-01 |
I pretty sure I can do this using python and pandas or spark, but I'd prefer to keep this in SQL and I like the cleanliness of the Lateral Join. I do know that table 1 (A) works and so does Table B (The pivot). The issue is putting table B inside of the Lateral Join is where my issue is arising.