0

Let's say I have the following table with data:

sales:

id timestamp product price
1 2014-01-01 01:02:03 phone 14.99
2 2014-01-01 03:02:03 car 1200.00

And then we have transactions stored in a separate table

cdc:

type id timestamp product price
DELETE 1 2014-01-01 04:02:03
APPEND 3 2014-01-02 04:02:03 computer 799.00
UPDATE 3 2014-01-02 04:02:03 computer 805.00

Would it be possible to do a single query to get the 'current table' up to a current timestamp? For example, something like:

-- only takes into account APPENDS
SELECT * FROM sales WHERE timestamp > '2014-02-01 00:00:00'
UNION
SELECT * FROM sales WHERE type='APPEND' AND timestamp > '2014-02-01 00:00:00'

But also including UPDATEs and DELETEs? I suppose a procedural table function is fine as well here.

For example, the table up to current is:

id timestamp product price
2 2014-01-01 03:02:03 car 1200.00
3 2014-01-02 04:02:03 computer 805.00

(Any database dialect is fine here.)

Gnqz
  • 3,292
  • 3
  • 25
  • 35
David542
  • 104,438
  • 178
  • 489
  • 842
  • 1
    I suggest including the exact output you want to see here. I don't entirely follow your question (and I stare at SQL code for several hours each day). – Tim Biegeleisen Jun 28 '22 at 04:07
  • @TimBiegeleisen updated. It's just processing a cdc basically with a sql statement. – David542 Jun 28 '22 at 04:09
  • If you remove type='APPEND' from the WHERE clause you will include UPDATEs and DELETEs. Isn't this that you want? – forpas Jun 30 '22 at 15:09
  • Is there a case where directly extracting the last state of an item produces different result from performing *all* operations in the specified order. – Salman A Jul 04 '22 at 11:16
  • Sorry. I do not understand what you need to – CaptainPyscho Jul 04 '22 at 16:50

5 Answers5

2

PostgreSQL

I use CTE and LAST_VALUE window function to get last "state" of product in cdc table. It's assumed that the row order in cdc conforms time flow and the last one row for product meets it's natural state in the time.

I also use LAST_VALUE when aggregating data from the sales table and the last state of the product from the CTE, since a product can exists in both tables (I added another row to sales table to show the problem, but if you remove the top aggregate query there will be 2 rows for computer with different timestamp and price - one per sales and cdc tables - check out the second query in a demo. This problem exists in @Larnu's solution).

WITH last_transactions AS (
SELECT 
    DISTINCT id,
    product,
    LAST_VALUE(type) OVER (PARTITION BY id) AS type,
    LAST_VALUE(timestamp) OVER (PARTITION BY id) AS timestamp,
    LAST_VALUE(price) OVER (PARTITION BY id) AS price
FROM cdc
WHERE timestamp < '2014-02-01'
)
SELECT 
   DISTINCT ON (id, product)
   id,
   product,
   LAST_VALUE(timestamp) OVER (PARTITION BY id ORDER BY timestamp DESC) AS timestamp,
   LAST_VALUE(price) OVER (PARTITION BY id ORDER BY timestamp DESC) AS price
FROM (   
    SELECT 
        id, timestamp, product, price 
    FROM sales 
    WHERE timestamp < '2014-02-01' 
          AND NOT EXISTS (
              SELECT 
              FROM last_transactions 
              WHERE id = sales.id AND timestamp >= sales.timestamp
          )
    UNION ALL
    SELECT id, timestamp, product, price 
    FROM last_transactions 
    WHERE type != 'DELETE'
) t

Please, take a look at demo

Alexey
  • 2,439
  • 1
  • 11
  • 15
1

This is T-SQL (The DDL and DML is bespoke, as is inclusion of the schema), but this should work in most dialects, or need minimal changes. First, I would use a CTE to ROW_NUMBER by ID, to get the "latest" row to have a row number of 1. Then you could get the rows from the original table where no rows EXISTS in the CTE, and then UNION ALL to "Top 1 per group" row from the CTE:

SELECT *
INTO dbo.sales
FROM (VALUES(1,CONVERT(datetime2(0),'2014-01-01 01:02:03'),'phone',14.99),
            (2,CONVERT(datetime2(0),'2014-01-01 03:02:03'),'car',1200.00))V(id,timestamp,product,price)
GO

SELECT *
INTO dbo.cdc
FROM (VALUES('DELETE',1,CONVERT(datetime2(0),'2014-01-01 04:02:03'),NULL,NULL),
            ('APPEND',3,CONVERT(datetime2(0),'2014-01-02 04:02:03'),'computer',799.00),
            ('UPDATE',3,CONVERT(datetime2(0),'2014-01-02 04:02:03'),'computer',805.00))V(type,id,timestamp,product,price);
GO

WITH CTE AS(
    SELECT type,
           id,
           timestamp,
           product,
           price,
           ROW_NUMBER() OVER (PARTITION BY id ORDER BY timestamp DESC) AS RN
    FROM dbo.cdc
    WHERE Timestamp <= '2014-02-01T00:00:00')
SELECT id,
       timestamp,
       product,
       price
FROM dbo.sales s
WHERE [timestamp] <= '2014-02-01T00:00:00'
  AND NOT EXISTS (SELECT 1
                  FROM CTE C
                  WHERE C.id = s.id)
UNION ALL
SELECT id,
       timestamp,
       product,
       price
FROM CTE
WHERE type <> 'DELETE'
  AND RN = 1;


GO
DROP TABLE dbo.sales;
DROP TABLE dbo.cdc

A note that the 2 entries for id 3 have the same value for timestamp and there is nothing else to ORDER that data by. As such the row returned is arbitrary. Likely you should have an additional always ascending id column in that table, which you should add to the ORDER BY clause for ROW_NUMBER in descending order too (ORDER BY timestamp DESC, YourAscendingID DESC). This is why the above may return 799.00 for the value of id 3 (as they both have the same value for timestamp).

You could likely assume that an APPEND always occurs before an UPDATE, that always occurs before a DELETE, and then use a CASE expression, but if you have 2 UPDATEs at the same time, then which you get would be completely arbitrary. The CASE expression would be the following, but as mentioned, is flawed with 2 UPDATEs at the same timestamp:

CASE [type] WHEN 'APPEND' THEN 1
            WHEN 'UPDATE' THEN 2
            WHEN 'DELETE' THEN 3
END

Again, you would order the above in descending order.

Thom A
  • 88,727
  • 11
  • 45
  • 75
1

Oracle solution
With sample data as defined in the question:

WITH
    sales AS
        (
          Select 1 "ID", CAST(To_Date('2014-01-01 01:02:03', 'yyyy-mm-dd hh24_mi_ss') as TimeStamp) "DTTM", 'phone' "PRODUCT", 14.99 "PRICE" From DUAL  UNION ALL
          Select 2 "ID", CAST(To_Date('2014-01-01 03:02:03', 'yyyy-mm-dd hh24_mi_ss') as TimeStamp) "DTTM", 'car' "PRODUCT", 1200 "PRICE" From DUAL
        ),
    cdc AS
        (
          Select 'DELETE' "TP", 1 "ID", CAST(To_Date('2014-01-01 04:02:03', 'yyyy-mm-dd hh24_mi_ss') as TimeStamp) "DTTM", Null "PRODUCT", Null "PRICE" From DUAL  UNION ALL
          Select 'APPEND' "TP", 3 "ID", CAST(To_Date('2014-01-02 04:02:03', 'yyyy-mm-dd hh24_mi_ss') as TimeStamp) "DTTM", 'computer' "PRODUCT", 799.00 "PRICE" From DUAL  UNION ALL
          Select 'UPDATE' "TP", 3 "ID", CAST(To_Date('2014-01-02 04:02:03', 'yyyy-mm-dd hh24_mi_ss') as TimeStamp) "DTTM", 'computer' "PRODUCT", 805.00 "PRICE" From DUAL
        ),

Create event cards for IDs. Add total number of events and event order number to the cards using analytic functions partitioned by ID and ordered by ID and DTTM (in case of order number).
Event cards are shown at the end of SQL below:

    event_cards AS
        (   Select Count(ID) OVER(PARTITION BY ID ORDER BY ID) "TOTAL_EVENTS",
                  Count(ID) OVER(PARTITION BY ID ORDER BY ID, DTTM ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) "EVENT_NUMBER",
                  ID, TP, DTTM, Max(PRODUCT) "PRODUCT", PRICE "PRICE"
            From (Select 'INIT' "TP", s.ID, s.DTTM "DTTM", Max(s.PRODUCT) "PRODUCT", s.PRICE "PRICE" From sales s Group By s.ID, s.DTTM, s.PRICE UNION ALL
                  Select c.TP "TP", c.ID, c.DTTM, Max(c.PRODUCT) "PRODUCT", c.PRICE "PRICE" From cdc c Group By c.ID, c.DTTM,  c.TP, c.PRICE) 
            Group By ID, TP, DTTM, PRICE
        )   
--
--  e v e n t _ c a r d s   -->   R e s u l t   
--
--  TOTAL_EVENTS EVENT_NUMBER         ID TP     DTTM                         PRODUCT       PRICE
--  ------------ ------------ ---------- ------ ---------------------------- -------- ----------
--             2            1          1 INIT   01-JAN-14 01.02.03.000000000 phone         14.99 
--             2            2          1 DELETE 01-JAN-14 04.02.03.000000000                     
--             1            1          2 INIT   01-JAN-14 03.02.03.000000000 car            1200 
--             2            1          3 APPEND 02-JAN-14 04.02.03.000000000 computer        799 
--             2            2          3 UPDATE 02-JAN-14 04.02.03.000000000 computer        805    
--

I call it event cards, but as you can see, it is a kind of a transaction log actually.
From event cards take the rows having TOTAL_EVENTS = EVENT_NUMBER and exclude deleted IDs.
Final result is at the end of the SQL below

SELECT ID "ID", DTTM "DTTM", PRODUCT "PRODUCT", PRICE "PRICE"
FROM event_cards e
WHERE TOTAL_EVENTS = EVENT_NUMBER And TP != 'DELETE'
ORDER BY ID, DTTM
--
--  R e s u l t
--
--          ID DTTM                         PRODUCT       PRICE
--  ---------- ---------------------------- -------- ----------
--           2 01-JAN-14 03.02.03.000000000 car            1200 
--           3 02-JAN-14 04.02.03.000000000 computer        805

If there were more data through longer period of time, using event cards gives you freedom to get the state of the table at any time (in the past) by simple addition to the where clause like
"and DTTM <= AnyDateTimeYouWant" Regards...

d r
  • 3,848
  • 2
  • 4
  • 15
0

You can use something like this:

SELECT * FROM
(
SELECT cdc.type,     Switch(
        type = "APPEND", 2,
        type = "UPDATE", 3,
        type = "DELETE", 4
        ) AS typeid, cdc.id, cdc.timestamp, cdc.product, cdc.price
FROM cdc 
UNION 
SELECT "Current" AS type, 1 AS typeid, sales.id, sales.timestamp, sales.product, sales.price
FROM sales
)
ORDER BY id, typeid

You may have WHERE and other query clauses on both sub-queries and outer query

You can use a CASE statement for setting typeid for each transaction type in inner queries and then order outer query by typeid

Note: The query is written in MS Access

Eskandar Abedini
  • 2,090
  • 2
  • 13
0

Here is a query to do it in Oracle:

WITH dat AS 
(
  SELECT '1' AS ID, TO_DATE('2014-01-01 01:02:03','YYYY-MM-DD HH24:MI:SS') AS TIMESTAMP, 'phone' AS product, 14.99 AS price FROM dual UNION ALL
  SELECT '2', TO_DATE('2014-01-01 03:02:03','YYYY-MM-DD HH24:MI:SS'), 'car'  , 1200.00 FROM dual
), operation AS
(
  SELECT 'APPEND' AS TYPE, '3' AS ID, TO_DATE('2014-01-02 04:02:03','YYYY-MM-DD HH24:MI:SS') AS TIMESTAMP,'computer' AS product, 799.00 AS price FROM dual UNION ALL
  SELECT 'DELETE' , '1', TO_DATE('2014-01-01 04:02:03','YYYY-MM-DD HH24:MI:SS') , NULL     , NULL   FROM dual UNION ALL
  SELECT 'UPDATE' , '3', TO_DATE('2014-01-02 04:02:03','YYYY-MM-DD HH24:MI:SS') ,'computer', 805.00 FROM dual
)
SELECT ID, TIMESTAMP, product, price
  FROM operation op
 WHERE timestamp < TO_DATE('&ts','YYYY-MM-DD HH24:MI:SS')
   AND TYPE IN ('APPEND')
   AND NOT EXISTS (SELECT 1
                     FROM operation
                    WHERE TIMESTAMP >= op.timestamp 
                      AND ID        = op.id
                      AND TYPE      = 'UPDATE')
 UNION 
SELECT ID, TIMESTAMP, product, price
  FROM operation op
 WHERE timestamp < TO_DATE('&ts','YYYY-MM-DD HH24:MI:SS')
   AND TYPE IN ('UPDATE')
   AND NOT EXISTS (SELECT 1
                     FROM operation
                    WHERE TIMESTAMP > op.timestamp 
                      AND timestamp < TO_DATE('&ts','YYYY-MM-DD HH24:MI:SS')
                      AND ID        =  op.id
                      AND TYPE      =  op.type)
 UNION 
SELECT ID, TIMESTAMP, product, price
  FROM dat
 WHERE timestamp < TO_DATE('&ts','YYYY-MM-DD HH24:MI:SS')
 MINUS
SELECT dt.ID, dt.TIMESTAMP, dt.product, dt.price
  FROM operation op
     , dat       dt
 WHERE op.id     = dt.id
   AND op.timestamp >= dt.timestamp
   AND op.timestamp <= TO_DATE('&ts','YYYY-MM-DD HH24:MI:SS')
   AND op.type IN ('DELETE');
Gnqz
  • 3,292
  • 3
  • 25
  • 35