3

I'm trying to speed up this PostgreSQL query to find previous entries "on this day" in past years from a table. I currently have the query below:

select * from sample
where date_part('month', "timestamp") = date_part('month', now())
and date_part('day', "timestamp") = date_part('day', now())
order by "timestamp" desc;

This seems to get the intended result, but it is running much slower than desired. Is there a better approach for comparing the current month & day?

Also, would there be any changes to do a similar search for "this hour" over the past years? Similar to the following:

select * from sample
where date_part('month', "timestamp") = date_part('month', now())
and date_part('day', "timestamp") = date_part('day', now())
and date_part('hour', "timestamp") = date_part('hour', now())
order by "timestamp" desc;

The data is time-series in nature, using TimescaleDB as the database. Here is the current definition:

CREATE TABLE public.sample (
    "timestamp" timestamptz NOT NULL DEFAULT now(),
    entity varchar(256) NOT NULL,
    quantity numeric NULL
);
CREATE INDEX sample_entity_time_idx ON public.sample (entity, "timestamp" DESC);
CREATE INDEX sample_time_idx ON public.sample ("timestamp" DESC);
jheddings
  • 26,717
  • 8
  • 52
  • 65
  • You want to compare on an actual date range. Somebody else will know the Postgres syntax. Something like `where "timestamp" >= now()::date and "timestamp" < now()::date + interval '1 day'` – shawnt00 Jul 26 '22 at 19:31
  • 1
    Please share your table definition and if there is index, also any of them. – Umut TEKİN Jul 27 '22 at 00:45
  • @TheImpaler thanks for the pointer... That's returning all entries for today only, not "on this day from previous years." – jheddings Jul 27 '22 at 16:48
  • @jheddings Oops... I didn't understand the question. Let me think... – The Impaler Jul 27 '22 at 16:58

1 Answers1

2

If you need the same day in all previous years that I would guess that the query would return 1/365th of the rows; that's a 0.27% selectivity. Great.

With that selectivity an index can speed up the query significantly. Now, since you are selecting non-consecutive rows you'll need a functional index. I would try:

create index ix1 on sample ((date_part('doy', "timestamp")));

Then, you can modify your query to:

select * 
from sample
where date_part('doy', "timestamp") = date_part('doy', now())
order by "timestamp" desc;

For the current hour in the past years you would have an even better selectivity of around 1/365/24; that is 0.01%. Awesome.

create index ix2 on sample (
  (date_part('doy', "timestamp")), 
  (date_part('hour', "timestamp"))
);

Then, the new query could look like:

select * 
from sample
where date_part('doy', "timestamp") = date_part('doy', now()) 
  and date_part('hour', "timestamp") = date_part('hour', now()) 
order by "timestamp" desc;

Please post the execution plans of these queries with the indexes created. I'm curious to see how well the perform.

The Impaler
  • 45,731
  • 9
  • 39
  • 76
  • 1
    Ah, clever. I didn't consider using a function in the index. When trying to create the index, however, it looks like date_part is not IMMUTABLE (and thus the CREATE INDEX fails). I'll keep poking around. – jheddings Jul 27 '22 at 17:36
  • 2
    It seems it's the `TIMESTAMPTZ` data type. The function may produce different result in different environments. If you used `TIMESTAMP` the solution works. See https://dbfiddle.uk/?rdbms=postgres_14&fiddle=4a75da04a1b8e13a8b31a88fbfae2af9 In fact, I'm thinking your queries may not be working correctly under all circumstances, but this is just a guess. – The Impaler Jul 27 '22 at 17:42
  • 1
    Got it, thanks. The data set is an existing table over 7 years with 22M entries, so I'm not sure if I can change the data type... Let me see if I can make some modifications or work on a copy. – jheddings Jul 27 '22 at 18:04