Cache greenplum query plan globally?

Question

I'd like to save planner cost using plan cache, since OCRA/Legacy optimizer will take dozens of millionseconds.

I think greenplum cache query plan in session level, when session end or other session could not share the analyzed plan. Even more, we can't keep session always on, since gp system will not release resource until TCP connection disconnected.

most major database cache plans after first running, and use that corss connections.

So, is there any switch that turn on query plan cache cross connectors? I can see in a session, client timing statistics not match the "Total time" planner gives?

Brendan Stephens · Accepted Answer · 2018-07-24T13:35:26.557

1

Postgres can cache the plans as well, which is on a per session basis and once the session is ended, the cached plan is thrown away. This can be tricky to optimize/analyze, but generally of less importance unless the query you are executing is really complex and/or there are a lot of repeated queries.

The documentation explains those in detail pretty well. We can query pg_prepared_statements to see what is cached. Note that it is not available across sessions and visible only to the current session.

When a user starts a session with Greenplum Database and issues a query, the system creates groups or 'gangs' of worker processes on each segment to do the work. After the work is done, the segment worker processes are destroyed except for a cached number which is set by the gp_cached_segworkers_threshold parameter.

A lower setting conserves system resources on the segment hosts, but a higher setting may improve performance for power-users that want to issue many complex queries in a row.

Also see gp_max_local_distributed_cache.

Obviously, the more you cache, the less memory there will be available for other connections and queries. Perhaps not a big deal if you are only hosting a few power users running concurrent queries... but you may need to adjust your gp_vmem_protect_limit accordingly.

For clarification: Segment resources are released after the gp_vmem_idle_resource_timeout. Only the master session will remain until the TCP connection is dropped.

edited Jul 24 '18 at 13:35

answered Jul 24 '18 at 13:29

Brendan Stephens

227
1
5

Very useful information, I‘d like detail my quesion go deeper. Seperate cache into "high level"/"Low level" cache, some kind like logic/phisic cache. I think the master node process QD->Analyzer->Optimizer->Plan, QE on segment will carry out. Before the quesion, I think bottleness of gp system should be CPU/Network/Master node, master lunch parallel worker use genetic algorithms seek for best plan. For the analyzed plan, it is expensive, should be cache utill pg_statistics updated or user triiger after ETL finished. – Lei Chi Jul 25 '18 at 11:55
For the redistributed/brodcasted data, I have read an artical that some extensions may cache tuples(Can't find that page any more~) , and should be finalized by hit ratio. – Lei Chi Jul 25 '18 at 11:58
For the more lower level, gpsystem/postgres already did lot of efforts. – Lei Chi Jul 25 '18 at 11:59
For the top level, I have plenty small queries which query statement schema not changed, I'd like blindly cache the plan until plan should be expired and renew. I'm happy with these query use same plan: SELECT XXX FROM tb WHERE alias='xxx' or SELECT XXX FROM tb WHERE alias/email='xxx' – Lei Chi Jul 25 '18 at 12:02
There's no way to force a plan to cache -- you can only tune the cache parameters to affect things like size, threshold and timing. – Brendan Stephens Jul 26 '18 at 14:41

Cache greenplum query plan globally?

1 Answers1