0

my question is basically the same as this one, but i couldn't find an answer, its also written "to be solved in the next release" and "easy for min/max scans"

PostgreSQL+table partitioning: inefficient max() and min()

CREATE TABLE mc_handstats
(
  id integer NOT NULL DEFAULT nextval('mc_handst_id_seq'::regclass),
  playerid integer NOT NULL,
  CONSTRAINT mc_handst_pkey PRIMARY KEY (id),
);

table is partitioned over playerid.

CREATE TABLE mc_handst_0000 ( CHECK ( playerid >= 0 AND playerid < 10000) ) INHERITS (mc_handst) TABLESPACE ssd01;
CREATE TABLE mc_handst_0010 ( CHECK ( playerid >= 10000 AND playerid < 30000) ) INHERITS (mc_handst) TABLESPACE ssd02;
CREATE TABLE mc_handst_0030 ( CHECK ( playerid >= 30000 AND playerid < 50000) ) INHERITS (mc_handst) TABLESPACE ssd03;
...

CREATE INDEX mc_handst_0000_PlayerID ON mc_handst_0000 (playerid);
CREATE INDEX mc_handst_0010_PlayerID ON mc_handst_0010 (playerid);
CREATE INDEX mc_handst_0030_PlayerID ON mc_handst_0030 (playerid);
...

plus create trigger on playerID

i want to get the last id (i could also get the value for the sequence, but i am used to work with tables/colums), but pSQL seems to be rather stupid scanning the table:

EXPLAIN ANALYZE select max(id) from mc_handstats; (the real query runs forever)

"Aggregate  (cost=9080859.04..9080859.05 rows=1 width=4) (actual time=181867.626..181867.626 rows=1 loops=1)"
"  ->  Append  (cost=0.00..8704322.43 rows=150614644 width=4) (actual time=2.460..163638.343 rows=151134891 loops=1)"
"        ->  Seq Scan on mc_handstats  (cost=0.00..0.00 rows=1 width=4) (actual time=0.002..0.002 rows=0 loops=1)"
"        ->  Seq Scan on mc_handst_0000 mc_handstats  (cost=0.00..728523.69 rows=12580969 width=4) (actual time=2.457..10800.539 rows=12656647 loops=1)"
...
ALL TABLES
...
"Total runtime: 181867.819 ms"

EXPLAIN ANALYZE select max(id) from mc_handst_1000

"Aggregate  (cost=83999.50..83999.51 rows=1 width=4) (actual time=1917.933..1917.933 rows=1 loops=1)"
"  ->  Seq Scan on mc_handst_1000  (cost=0.00..80507.40 rows=1396840 width=4) (actual time=0.007..1728.268 rows=1396717 loops=1)"
"Total runtime: 1918.494 ms"

the runtime for the partitioned table is 'snap', and completely off the record over the master table. (postgreSQL 9.2)


\d mc_handstats (only the indexes)

Indexes:
    "mc_handst_pkey" PRIMARY KEY, btree (id)
Foreign-key constraints:
    "mc_handst_playerid_fkey" FOREIGN KEY (playerid) REFERENCES mc_players(id)
Triggers:
    mc_handst_insert_trigger BEFORE INSERT ON mc_handstats FOR EACH ROW EXECUTE PROCEDURE mc_handst_insert_function()
Number of child tables: 20 (Use \d+ to list them.)

\d mc_handst_1000

Indexes:
    "mc_handst_1000_playerid" btree (playerid)
Check constraints:
    "mc_handst_1000_playerid_check" CHECK (playerid >= 1000000 AND playerid < 1100000)

hm, no PK index in the sub tables. while i don't understand why the result for max(id) is pretty fast on the subtables (as there is no index) and slow from the master table, it seems i need to add an index for PK also for all subtables. maybe that solves it.


CREATE INDEX mc_handst_0010_ID ON mc_handst_0010 (id);
... plus many more ...

and everything fine. still strange why it worked fast on the subtables before, that made me think they are indexed, but i also don't care to much.

thanks for this!

Community
  • 1
  • 1
Chris
  • 129
  • 1
  • 2
  • 11
  • 3
    The table is partitioned on playerid and you search for the max(id). Why do you expect it to do any other thing than a sequential scan on all tables? The max(id) can be in any of the tables. Also there is no index on id. – Clodoaldo Neto Jun 04 '13 at 12:26
  • ID is the primary key, its automatically indexed, no? (the results are fast when i just run the query over a single table) – Chris Jun 04 '13 at 13:04
  • 1
    Chris: Is it actually the PK in each partition? Show `\d mc_handst_1000` and `\d mc_handstats`. Partitioning is a bit ... primitive ... and you'll find that *unique constraints only apply within a partition, not across partitions*; this extends to primary key constraints. – Craig Ringer Jun 04 '13 at 13:04
  • no, its not. the partitioning is not done via the PK, its done via another field, which is also NOT unique btw (but thats not a problem afair). i'll add \d above. – Chris Jun 04 '13 at 13:06

2 Answers2

0

The first thing you need to do is index all the child tables on (id) and see if max(id) is smart enough to do an index scan on each table. I think i should be but I am not entirely sure.

If not, here's what I would do: I would start with currval([sequence_name]) and work back until a record is found. You could do something check blocks of 10 at a time, or the like in what is essentially a sparse scan. This could be done with a CTE like such (again relies on indexes):

 WITH RECURSIVE ids (
      select max(id) as max_id, currval('mc_handst_id_seq')  - 10 as min_block
        FROM mc_handst
       WHERE id BETWEEN currval('mc_handst_id_seq') - 10 AND currval('mc_handst_id_seq')
      UNION ALL
      SELECT max(id), i.min_block - 10
        FROM mc_handst
        JOIN ids i ON id BETWEEN i.min_block - 10 AND i.min_block
       WHERE i.max_id IS NULL
 )
 SELECT max(max_id) from ids;

That should do a sparse scan if the planner won't use an index once the partitions are indexed. In most cases it should only do one scan but it will repeat as necessary to find an id. Note that it might run forever on an empty table.

Chris Travers
  • 25,424
  • 6
  • 65
  • 182
0

Assuming a parent's table like this:

CREATE TABLE parent AS (
  id not null default nextval('parent_id_seq'::regclass)
  ... other columns ...
);

Whether you're using a rule or a trigger to divert the INSERTs into the child tables, immediately after the INSERT you may use:

SELECT currval('parent_id_seq'::regclass);

to get the last id inserted by your session, independently of concurrent INSERTs, each session having its own copy of the last sequence value it has obtained.

https://dba.stackexchange.com/questions/58497/return-id-from-partitioned-table-in-postgres