Postgresql - Querying a column that is indexed by proxy of another column

Question

I have the following query:

SELECT SUM(data), foreign_key
FROM (SELECT * 
    FROM really_big_table
    ORDER BY auto_incremented_id DESC
    LIMIT reasonable_number)
WHERE inserted_timestamp > now() - INTERVAL '1 hour'
GROUP BY foreign_key

This query successfully averts running a sequential scan over inserted_timestamp, but it will utterly fail if there is more than a reasonable number of rows that I need to retrieve. Since inserted_timestamp is not indexed, but follows the same sequence as auto_incremented_id, I feel like I can make this query even more efficient without causing an hour's downtime creating a new index.

I would like to do something like this:

SELECT SUM(data), foreign_key
FROM really_big_table
ORDER BY id DESC
STOP WHEN created < now() - INTERVAL '1 hour'
GROUP BY foreign_key

In other words, I want the syntax so that my query will run an index scan of my table, and stop when the data is too old.

Is the field typed `timestamp` or `timestamp with time zone`? Show the schema please. — Craig Ringer, Jul 12 '14 at 02:36
You could search (binary or whatever) for the point at which the `auto_incremented_id` has the value (approximately) equal to `now() - INTERVAL '1 hour'`. Then, use a where-clause on that `>that_id`. But the "binary search" would have to be done procedurally (as a store proc or external to SQL). It also assumes the id-timestamp relationship is real, which is not truly guaranteed. — Dwayne Towell, Jul 12 '14 at 02:47
@CraigRinger I have similar queries running on multiple tables. I'm pretty sure they all are just normal timestamps, but I can't be sure until I go back to work on Monday. — Jacklynn, Jul 12 '14 at 21:57

tsnorri · Accepted Answer · 2014-07-14T17:57:55.640

One possibility to speed things up is to use table partitioning if you're not already doing that.

Here's another idea:

BEGIN;
DECLARE my_cursor NO SCROLL CURSOR FOR
    SELECT data, foreign_key, inserted_timestamp
    FROM really_big_table
    ORDER BY id DESC;
FETCH FORWARD 5 FROM my_cursor;
-- Repeat as many times as you want
CLOSE my_cursor;
ROLLBACK; -- Or COMMIT

And calculate the sum in your application, or, if you want to do it in the database:

CREATE FUNCTION my_fetch() RETURNS SETOF really_big_table AS $$
DECLARE
    -- You could also select only the relevant columns here and change
    -- the function's return type.
    curs CURSOR FOR
        SELECT * FROM really_big_table ORDER BY id DESC;
BEGIN
    FOR current_row IN curs LOOP
        IF current_row.inserted_timestamp > CURRENT_TIMESTAMP - INTERVAL '1 hour' THEN
            RETURN NEXT current_row;
        ELSE
            RETURN;
        END IF;
    END LOOP;
    RETURN;
END
$$ STABLE LANGUAGE plpgsql;

Then you can do this:

SELECT SUM(data), foreign_key FROM my_fetch() GROUP BY foreign_key;

Postgresql - Querying a column that is indexed by proxy of another column

1 Answers1