How to fill gaps in timestream query

Question

I am very new to aws timestream, but from the docs, it looks like the perfect place to add trade data. I was using this data to create a candlestick graph. But when there are long periods of time when no trade happened, the time series is not contiguous. Is there any way to make the time buckets contiguous by adding an empty candle (ie all 0 values) in all missing time buckets? I have read about INTERPOLATE_FILL but couldn't get how to use it correctly for my use case. Here is the query I'm currently using without any interpolation.

SELECT 
    BIN(time, 5m) as timestamp,
    COALESCE(MAX(price),0) as high,
    COALESCE(MIN(price),0) as low,
    COALESCE(MAX_BY(price, time),0) as close,
    COALESCE(MIN_BY(price, time),0) as open,
    COALESCE(SUM(quantity),0) as volume
FROM "test"."test"
WHERE market = 'BTC/USDT' AND time BETWEEN from_iso8601_timestamp('2022-03-05T07:59:45.302Z') AND from_iso8601_timestamp('2023-03-05T07:59:45.302Z') 
GROUP BY BIN(time, 5m)
ORDER BY timestamp DESC

score 1 · Answer 1 · answered Mar 14 '23 at 11:18

There are several good examples in the document https://docs.aws.amazon.com/timestream/latest/developerguide/timeseries-specific-constructs.functions.interpolation.html

To fill default value in missing time points, interpolate_fill is the one you are looking for.

Here I wrote an example of getting a flat view of high/low price every 5 minutes per market. Haven't verified the syntax but just give you an idea of how to do that

with data_points as (
SELECT 
    BIN(time, 5m) as ts,
    COALESCE(MAX(price),0) as high,
    COALESCE(MIN(price),0) as low
FROM "test"."test"
WHERE time BETWEEN from_iso8601_timestamp('2022-03-05T07:59:45.302Z') AND from_iso8601_timestamp('2023-03-05T07:59:45.302Z') 
GROUP BY BIN(time, 5m)),
-- here you can create time series using INTERPOLATE_FILL function
series as (
select market,
  interpolate_fill(create_time_series(ts, high), SEQUENCE(min(ts), max(ts), 5m), 0) AS high_series,
  interpolate_fill(create_time_series(ts, low), SEQUENCE(min(ts), max(ts), 5m), 0) AS low_series
from data_points
group by market
)
-- at this point, `series` gives you 5 time series per market where 0 is filled in at missing date point
-- next step is to flat the data by using unnest
SELECT s1.market, hs.ts, hs.high, ls.low
from series s1
 cross join unnest(high_series) AS hs (ts, high)
left join series s2
 cross join unnest(low_series) as ls (ts, low)
 on hs.ts = ls.ts and s1.market = s2.market
order by hs.ts asc

hei @Liu thanks for the reply, Here is an implementation based on your answer. https://gist.github.com/nuel77/69ec7e2497e3d40e209028ae7c69cf87 its quite lengthy though :| — Emmanuel Thomas, Apr 17 '23 at 10:29

How to fill gaps in timestream query

1 Answers1