0

I have a history table that contains the records of each customer with start_date and end_date columns that indicate validity period of each rows. The table looks like this:

| ID  | Name | Code  |start_date (Timestamp) |end_date (Timestamp) |
|:--- |:----:|:-----:|----------------------:|--------------------:|
|123  | John | 100   |2021/1/6   8:00:00     |2021/1/31  8:00:00   | 
|123  | John | 101   |2021/1/31  8:00:00     |2021/2/15  8:00:00   | 
|123  | John | 102   |2021/2/15  8:00:00     |2021/3/15  8:00:00   | 
|123  | John | 103   |2021/3/15  8:00:00     |2021/6/15  9:00:00   | 
|123  | John | 105   |2021/6/15  9:00:00     |2021/6/15  9:15:00   | 
|123  | John | 106   |2021/6/15  9:15:00     |2021/6/15 10:00:00   |
|123  | John | 107   |2021/6/15 10:00:00     |2021/7/15 15:00:00   | 
|123  | John | 108   |2021/7/15 15:00:00     |null                 |

I decided to use "expand on" function to produce a column that would show "monthly" records (end of each month records). Desired output should look like this, where if end_date is null it should expand the record up until the current date but my expand syntax does not work correctly:

| ID  | Name | Code  |start_date (Timestamp) |end_date (Timestamp) |end_of_month
|---- |------|-------|-----------------------|---------------------|------------
|123  | John | 101   |2021/1/31  8:00        |2021/2/15  8:00:00   | 2021/1/31
|123  | John | 102   |2021/2/15  8:00        |2021/3/15  8:00:00   | 2021/2/28
|123  | John | 103   |2021/3/15  8:00        |2021/6/15  9:00:00   | 2021/3/31
|123  | John | 103   |2021/3/15  8:00        |2021/6/15  9:00:00   | 2021/4/30
|123  | John | 103   |2021/3/15  8:00        |2021/6/15  9:00:00   | 2021/5/31
|123  | John | 107   |2021/6/15 10:00        |2021/7/15 15:00:00   | 2021/6/30
|123  | John | 108   |2021/7/15 15:00        |?                    | 2021/7/30
|123  | John | 108   |2021/7/15 15:00        |?                    | 2021/8/31
|123  | John | 108   |2021/7/15 15:00        |?                    | 2021/9/30

I have below sql but it excludes the last record which is end_of month "2021/9/30". If I set the ANCHOR to "MONTH_BEGIN", The record "2021/9/30" will appear but it will exclude the record 2021/7/30 in return.

select a.id, a.name, a.code, a.start_date, a.end_date, last_day(BEGIN(bg2)) 
as end_of_month from (select id, name, code, start_date, end_date, 
period(start_date,COALESCE(MIN(start_date) OVER (PARTITION BY name ORDER BY start_date 
ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING), CURRENT_TIMESTAMP)) as bg from CHESS.HST) as a
expand on bg as bg2 by ANCHOR MONTH_END
Jojo10478
  • 29
  • 6
  • You can simply `add_months(CURRENT_TIMESTAMP, 1)` – dnoeth Sep 27 '21 at 15:55
  • hello, I am sorry but could you elaborate on that? I tried to use add_months at the end part like `where bg2 > add_months(current_timestamp,-1)` to get last month record but I got an error. It looks like add_months cannot be used for expanded column – Jojo10478 Sep 27 '21 at 16:02
  • No, it's in the period creation. Btw, based on your data you don't need the MIN/OVER, it already includes both start and end.. – dnoeth Sep 27 '21 at 16:04
  • I am sorry, I don`t get it, I am honestly new to sql. I would really appreciate it if you could help me understand how to write it. – Jojo10478 Sep 27 '21 at 16:12
  • I actually have this sql before. But the problem is it cannot read the row with end_date which is null so I cannot expand that row upto the current_date. `select a.id, a.name, a.code, a.start_date, a.end_date, BEGIN(bg) as end_of_month from CHESS.HST a qualify row_number() over (partition by trunc(start_date, 'MON') order by start_date desc) = 1 EXPAND ON PERIOD(start_date, next(end_date)) AS bg BY ANCHOR MONTH_END` – Jojo10478 Sep 27 '21 at 16:20
  • 1
    That's what dnoeth is saying. You want to use this for your period: `period(start_date,coalesce(end_date,add_months(current_timestamp,1))) as bg` – Andrew Sep 27 '21 at 16:55
  • thank you @Andrew! now it makes sense to me. – Jojo10478 Sep 28 '21 at 04:26

1 Answers1

2

This seems to match your requested result:

select a.id, a.name, a.code, a.start_date, a.end_date,
   last_day(begin(bg2)) as end_of_month
from
 (
   select id, name, code, start_date, end_date, 
      period(start_date,COALESCE(end_date
                                ,add_months(CURRENT_TIMESTAMP(0), 1)
                                )
            ) as bg
   from HST
 ) as a
expand on bg as bg2 
    by ANCHOR MONTH_END
dnoeth
  • 59,503
  • 4
  • 39
  • 56
  • I just realized how expand-on works and now it makes sense to me why I needed to add 1 month in the period. Thank you very much! But one thing I am not sure of, does expand_on `ANCHOR MONTH_END` "always" select the latest record of the month before it does the expansion? because if there are multiple records on single day with 1 minute or some seconds difference in its timestamp, I wonder if expand_on will always choose the latest one. – Jojo10478 Sep 28 '21 at 04:25
  • No, it does not select the *latest* row, EXPAND is row-based, as each row is expanded you might have multiple rows for the same month. My answer to your previous question shows how to get one row per month: https://stackoverflow.com/a/69094883/2527905 – dnoeth Sep 28 '21 at 08:18
  • I will freely admit I get confused with the anchor bit, but I think using `MONTH_END` is causing the first row (2021-01-06) to fall out. If you use something like `DAY`, you will see rows for that date. – Andrew Sep 28 '21 at 14:12
  • I have tested it many times. Luckily my table contains only 1 record for every end of the month so I am accepting this as the answer. Thank you so much @dnoeth!! – Jojo10478 Sep 30 '21 at 10:37
  • Hi @Andrew you are right! There is just a little problem with `expand on month_end`. The date and time information in the end_of_month column is not correct due to time zone difference. Changing the time zone interval setting will resolve the problem, but it seems not possible for Views. – Jojo10478 Sep 30 '21 at 11:03
  • The time zone is added due to CURRENT_TIMESTAMP, you can switch to either `CAST(CURRENT_TIMESTAMP(0) AS TIMESTAMP(0))` or `CAST(CURRENT_DATE AS TIMESTAMP(0))`. – dnoeth Sep 30 '21 at 11:30
  • Hello, I know its been a while since I posted about this but I am just wondering if you could give me some advice about using expand on. I am trying to use `expand on` in millions of records and it is so slow. Do you know some ways on how to speed up the process of expand on? should I partition the start_date by month before expanding it? Would really appreciate any advice. Thank you. – Jojo10478 Oct 19 '21 at 06:41
  • Hard to tell without details. EXPAND ON is fast, it's just creating additional rows within AMP. Is it part of a more complex query, can you show Explain? How many rows before and after expansion? – dnoeth Oct 19 '21 at 08:12
  • I created two views. 1 view expands all the data of a certain table, called this master view. And the other view selects the record of a specific month (last month, two months ago, etc) from the master view. This is the explain of the master view: – Jojo10478 Oct 21 '21 at 05:51
  • 1. First, we lock SMB_SEC.TABLE in view NXTG.TRP001 for access. 2. Next, we do an ALL AMPS RETRIEVE step from SMB_SEC.T1 in view NXTG.TRP001 by way of an all-rows scan with no residual conditions and EXPAND ON ("PERIOD(SMB_SEC.T1 in viewNXTG.TRP001.START_DATE, (CASE WHEN(NOT(SMB_SEC.T1 in view NXTG.TRP001.END_DATE is NULL)) THEN SMB_SEC.T1 is view NXTG.TRP001.END_DATE) ELSE(ADD_MONTHS((CURRENT_TIMESTAMP(0)),(1))))END)) (NAMED BG)") into spool(used to materialize view, derived table, table function or table operator h1)(all-amps), which is built locally on the AMPS. – Jojo10478 Oct 21 '21 at 05:53
  • The input table will not be cached in memory but it is eligible for synchronized scanning. The size of spool 1 is estimated with no confidence to be 1,216,666,667 rows (3,431,000,000,000 bytes)The estimated time for this step is 4 mins and 57 seconds. 3. We do an all amps sum step to aggregate from spool 1 (lastuse) by way of cylinder index scan. Aggregate intermediate results are computed globally, then placed in spool 5. The size of spool 5 is s estimated with high confidence to be 1 row (23 bytes)The estimated time for this step is 2 minutes and 3 seconds. – Jojo10478 Oct 21 '21 at 05:54
  • The estimated row count of 1.2 bln might be much higher after expand. The explain indícates you're simply counting the number of rows after expand, is this correct? – dnoeth Oct 21 '21 at 07:57
  • yes, I am simply counting the rows. because selecting the data itself seems to take longer. but in the end, what I really want to do is to select the data for each month using the second view. – Jojo10478 Oct 22 '21 at 00:52
  • Do you think it will be easier for me to select the month-end record of each month if I use a Temporal table instead? that seems to be the case but I am not certain about it, I am still trying to figure out how to use the Temporal table. – Jojo10478 Oct 22 '21 at 00:55
  • *Temporal* tables implement *Slowly Changing Dimensions (SCD) Type2*, but you probably mean *Time Series* tables, see my [answer](https://stackoverflow.com/a/69094883/2527905) to a previous (but probably the same) question. If you wrap the expand in a view the optimizer might not be able to push WHERE-conditions on it, i.e. apply them *before* expand. Depending on your needs a *Macro* might be better. To get the *month-end record* there will be several solutions, the best one is hard to tell without knowing all the details (even harder for an SQL novice) – dnoeth Oct 22 '21 at 07:38
  • my table is a history table( which is being updated by Trigger everytime a change:"insert, update, delete" is made from the master table). Having the start_date and end_date column tells me when the record is inserted/updated/deleted. insert and update will add new row to the history table while delete just updates the end_date of the existing record of the history table.And my goal is to select those records at a specific point of time: end of month (this month, last month, two months ago etc.). – Jojo10478 Oct 22 '21 at 08:13
  • I was thinking if i could replace the history table with temporal table if it would make it easier for me to select the records at the end of each month. I wish I could use the time-series table too but it does not work for me as of the moment. will check more on that. thank you so much for your comment @dnoeth – Jojo10478 Oct 22 '21 at 08:14
  • With a Temporal Table you can easily get all the data at a specific point in time, see [this whitepaper](https://cs.ulb.ac.be/public/_media/teaching/infoh415/teradata_temporal_case_study.pdf). But if you already have a properly maintained Slowly Changing Dimension with begin/end timestamps you just need to code the correct WHERE-conditions, e.g. 'where timestamp '2021-09-30 23:59:59' between start_date and prior(end_date)` – dnoeth Oct 22 '21 at 14:46
  • Hi @dnoeth, thank you very much for this whitepaper! I find this very useful. We have just created the history table and currently testing it. I am thinking of suggesting to replace it with transaction time table (temporal table) if selecting month end data will become easier and faster. I`m going to test it. – Jojo10478 Oct 25 '21 at 14:01