0

Is there a way to force Greenplum PostgreSQL to materialize a subquery in a WITH clause like what MATERIALIZE and INLINE optimizer hints do as below in Oracle?

WITH dept_count AS (
  SELECT /*+ MATERIALIZE */ deptno, COUNT(*) AS dept_count
  FROM   emp
  GROUP BY deptno)
SELECT ...

I've been searching this for a while, only to find this functionality in Oracle.

I know I can use CREATE TABLE AS, but I have several similar queries, forcing me to drop the temporary table after each query, which is very inconvenient and maybe inefficient.

Update: I tested the following table:

CREATE TABLE test (id: INT);

EXPLAIN WITH test2 AS (SELECT id FROM test)
SELECT COUNT(*) FROM test2;

                                 QUERY PLAN                                     
------------------------------------------------------------------------------------
Aggregate  (cost=0.36..0.37 rows=1 width=8)
   ->  Gather Motion 32:1  (slice1; segments: 32)  (cost=0.01..0.35 rows=1 width=8)
         ->  Aggregate  (cost=0.01..0.01 rows=1 width=8)
               ->  Subquery Scan test2  (cost=0.00..0.00 rows=1 width=0)
                     ->  Seq Scan on test  (cost=0.00..0.00 rows=1 width=4)

I'm using Greenplum Postgresql 8.2

Craig Ringer
  • 307,061
  • 76
  • 688
  • 778
Yang
  • 7,712
  • 9
  • 48
  • 65
  • Please post `explain analyze` output. – Craig Ringer Mar 09 '13 at 02:05
  • PostgreSQL 8.2? CTEs have been introduced with Postgres 8.4. You mean 9.2 by any chance? – Erwin Brandstetter Mar 09 '13 at 02:19
  • It's 8.2. I'm running on Greenplum, and Greenplum only supports older versions of Postgres – Yang Mar 09 '13 at 02:23
  • 2
    I am confused. How do you run a CTE in 8.2? Or is this the EXPLAIN output from Oracle? Also, under posts that are not mine you need to use [`@`-reply](http://meta.stackexchange.com/questions/43019/how-do-comment-replies-work) in comments to make sure I get a notification. – Erwin Brandstetter Mar 09 '13 at 02:40
  • @ErwinBrandstetter This is psql (8.2.15), the plan is not materializing anything. – Yang Mar 09 '13 at 02:49
  • [psql](http://www.postgresql.org/docs/current/interactive/app-psql.html) (the command line tool) may be version 8.2, but Postgres cannot - or it wouldn't have CTEs. Or did Greeplum backpatch CTEs? I would be rather surprised. – Erwin Brandstetter Mar 09 '13 at 03:07
  • @Yang Please show the exact text of the output of `SELECT version()`. – Craig Ringer Mar 09 '13 at 07:56
  • 2
    @Yang Also, if you're using Greenplum or similar please say so up front. It's very confusing when people post about 3rd party PostgreSQL variants without saying so, since all sorts of modifications have been made. *PostgreSQL doesn't even have a `Gather Motion` plan node, there's just no such thing; Pg 8.2 also has no CTE support.* So your question isn't really about PostgreSQL at all... – Craig Ringer Mar 09 '13 at 08:02
  • @CraigRinger sorry for that. version ------------------------------------------------------------------------------ -------------------------------------------------------------------- PostgreSQL 8.2.15 (Greenplum Database 4.2.2.4 build 1 Community Edition) on x86 _64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Oct 17 2012 1 1:52:28 – Yang Mar 09 '13 at 15:06

2 Answers2

3

In PostgreSQL, a CTE is an optimisation barrier that forces materialization of the CTE term. This will hopefully change in future releases, but a backward compatibility option would be provided if it ever does.

If you do an explain analyze on your query, you will find that the dept_count term is being executed in a separate plan tree as a CTE Scan. It's accumulated into a tuplestore just like materialized results, IIRC.

Update: The author is actually using Greenplum. The above statement does not appear to be true for Greenplum, who implemented their own CTE support on top of a PostgreSQL 8.2 codebase or did a non-straightforward backport of the 8.4 CTE feature with significant changes. In Greenplum it looks like you might have to use a temporary table unless there are additional Greenplum-specific features available.

Craig Ringer
  • 307,061
  • 76
  • 688
  • 778
  • @Yang Answer updated. In short, you asked a tagged `PostgreSQL` and you're really using Greenplum not PostgreSQL so I can't really help you in detail with Greenplum-specific problem; maybe ask on http://www.greenplum.com/communities/forums/ . I suspect you're out of luck and will need to use temp tables. – Craig Ringer Mar 10 '13 at 01:34
1

Temporary table

If you are looking for a temporary table that persists for the duration of your session, use an actual TEMPORARY TABLE.

CREATE TEMPORARY TABLE t AS
SELECT deptno, COUNT(*) AS dept_count
FROM   emp
GROUP  BY deptno;

SELECT ...
FROM t ...

CREATE [TEMPORARY] TABLE AS in the manual.

However, there is no "global" temporary table in PostgreSQL. A temporary tables is only visible to the user who created it and only for the duration of the session it was created in.

CTEs are only visible in the query they are part of. Never beyond that.

Temp table exclusive to a single query

To restrict the visibility of your temp tables to a single query, put them into a transaction and add ON COMMIT DROP, which drops the temp table automatically at the end of the transaction:

BEGIN;
CREATE TEMP TABLE t ON COMMIT DROP AS
SELECT ...

The only use case I can think of, where this would make sense: if you want to create indexes on a huge temporary table:

CREATE INDEX ON t(col1);

SELECT ..
FROM t ...;

ROLLBACK;

Or (doesn't make a difference here):

COMMIT;

If you use ROLLBACK, you can also just use a temp table without ON COMMIT DROP since everything is rolled back anyways.

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • Yes, this can solve my problem, but it would be better if I can have a temporary table visible only within one query since I have several queries, each requiring a temporary table. It would be quite inconvenient to create and drop tables frequently. – Yang Mar 09 '13 at 01:57
  • @Yang: OK, but that's what a CTE does out of the box: `a temporary table visible only within one query`. Unlike a subquery a CTE is visible at *any* query level of that query. What's the confusion here? – Erwin Brandstetter Mar 09 '13 at 01:58
  • A CTE does not materialize the table. I'm joining three very large tables Postgresql runs out of storage without materializing the first join. – Yang Mar 09 '13 at 01:59
  • @Yang: I added ways to create a temp table for *one* query only. But I am not sure this will give you any advantage over just using a CTE. – Erwin Brandstetter Mar 09 '13 at 02:06