3

I'm wondering if it's possible (and how to go about it if it is)

to partition a table in postgres 9.1 based on a date field being null or not null.

So effectively, I have a table and it gets quite large, I have a "delete_on" date in there and when items get deleted they just get tagged as deleted.

This would be ideal to archive them off and keep the active tables small.

I was thinking of a table partition and just having something like

active (delete_on = NULL)

archive (deleted_on != NULL)

Then when I do an update on a record, and set it's deleted on, it would get written to the archive table.

I could do it manually by just creating two separate tables and writing code to copy the data over, but I'd like it if I could just put it on the database directly.

Any ideas?

Edit:

Master table

CREATE TABLE my_table (
    id              int not null,
    deleted         date not null,
    number1         int,
    number2         int
);

Partitions

CREATE TABLE my_table_active (
  CHECK (deleted IS NULL)
) INHERITS (my_table);
CREATE TABLE my_table_archive ( 
  CHECK (deleted IS NOT NULL)
) INHERITS (my_table);

Trigger Function

CREATE OR REPLACE FUNCTION my_table_insert_trigger()
RETURNS TRIGGER AS $$
BEGIN
    IF ( NEW.deleted IS NULL ) THEN
        INSERT INTO my_table_active VALUES (NEW.*);
    ELSE 
        INSERT INTO my_table_archive VALUES (NEW.*);

    END IF;
    RETURN NULL;
END;
$$
LANGUAGE plpgsql;

Create Trigger

CREATE TRIGGER insert_my_table_trigger
    BEFORE INSERT ON my_table
    FOR EACH ROW EXECUTE PROCEDURE my_table_insert_trigger();

(I've taken the above from the Postgres documentation)

How would this work if you're doing an update on a field that is already in the active table?

So I have a record in the active table (i.e. delete is null) then I set that record to be deleted (by giving it a date stamp). At that point i want it moved to the archive table.

TheStoneFox
  • 3,007
  • 3
  • 31
  • 47
  • Well the state is either NULL (i.e. empty or not set) or it's a date (the date the record was deleted) so it' not technically a boolean. However, thinking about this problem even more I'm thinking do I even need to bother partitioning? The more records that get set with a deleted_on date then I just need to optimise my queries to only ever search records that have a deleted on of null and won't that make the query planner only check the smaller recordset anyway? – TheStoneFox Mar 06 '13 at 19:06

1 Answers1

5

You can partition on IS NULL and IS NOT NULL.

= NULL and != NULL both always result in NULL, so they conflict. You never write = NULL in SQL, always a IS NULL or a IS DISTINCT FROM b.

For the details of how to set up PostgreSQL's table-inheritance based partitioning and associated constraint exclusion, see the documentation.

There's no need to "write code to copy the data over" in any case. At worst you'd need to write a simple INSERT INTO newtable SELECT * FROM oldtable WHERE somefield IS NULL;.

Craig Ringer
  • 307,061
  • 76
  • 688
  • 778
  • Hi, Thanks very much for the info. Would I need to just set up a partition table or do I need to set a trigger up? I'll update my original question with my "stab" at what I think I need. – TheStoneFox Mar 05 '13 at 10:23
  • @Tada.wav You can either have your application insert directly into the partitions, or have your application insert into the top-level table and get a trigger to redirect writes into the partitions. Getting the app to write directly into the partitions is more efficient, but requires the app to understand the partitioning scheme and thus makes it harder to change later, so most people use triggers. – Craig Ringer Mar 05 '13 at 11:06
  • Hi, I'd prefer to use the triggers if possible, but I'm still not sure how they would handle an update. If I have a table in the active partition and I set it's deleted date then how does the trigger know to delete it from the active table and write it to the archive partition? – TheStoneFox Mar 05 '13 at 11:14
  • @Tada.wav See http://stackoverflow.com/q/1798209/398670 . Basically, you `UPDATE` the top-level partition and the `BEFORE UPDATE ... FOR EACH ROW` trigger converts updates, or at least cross-partition updates, into a `DELETE` and an `INSERT`. That's effectively how PostgreSQL updates work internally anyway, you're just doing it between two tables. Partitioning in Pg is way harder than it should be and we'd all love to improve it. There's been some work on it recently, so with luck maybe post-9.3... – Craig Ringer Mar 05 '13 at 11:18
  • Thanks for the info, I'll mark this as the answer as it leads to a solution. – TheStoneFox Mar 05 '13 at 15:30