0

Basically, is the code below efficient (if I cannot use @ variables in MonetDB), or will this call the subqueries more than once each?

CREATE VIEW sys.share26cuts_2007 (peorglopnr,share26cuts_2007) AS (
SELECT peorglopnr, CASE WHEN share26_2007 < (SELECT QUANTILE(share26_2007,0.25) FROM sys.share26_2007) THEN 1
                        WHEN share26_2007 < (SELECT QUANTILE(share26_2007,0.5) FROM sys.share26_2007) THEN 2
                        WHEN share26_2007 < (SELECT QUANTILE(share26_2007,0.75) FROM sys.share26_2007) THEN 3
                        ELSE 4 END AS share26cuts_2007
FROM sys.share26_2007
);

I would rather not use a user-defined function either, though this came up in other questions.

Community
  • 1
  • 1
László
  • 3,914
  • 8
  • 34
  • 49
  • 1
    I know that the `CASE` statement will be executed once for each row in SQLServer but I have no idea about monetdb. – user2989408 Jun 12 '14 at 17:38
  • Pretty sure you'll want to store those `QUANTILE()` values in a table so they're not being calculated multiple times. – Hart CO Jun 12 '14 at 17:46
  • @user2989408 Thanks, but at least on SQLServer, what you say would mean that there is no caching on the subquery, nothing recognizes that this is the same thing? – László Jun 12 '14 at 17:46
  • @GoatCO I would appreciate some help with what you have in mind, how I could refer to those stored values in the query. I am happy to accept your answer if you write it up! – László Jun 12 '14 at 17:48
  • I would use **Execution Plan** to answer that question and compare total cost among trials. – BI Dude Jun 12 '14 at 18:01
  • @TadasV Thanks, I don't think that's available in MonetDB. – László Jun 12 '14 at 18:04
  • @BIDude I don't think that's available in MonetDB. – László Jun 12 '14 at 19:14
  • I put the conclusion from here into an answer, but there are other errors with the quantiles, sadly. – László Jun 12 '14 at 21:19

2 Answers2

0

As e.g. GoatCO commented on the question, this is probably better avoided. The SET command that MonetDB support can be used with SELECT as in the code below. The remaining question is why all quantiles are zero where my data is surely not (I also got division by zero errors before using NULLIF). I show more of the code now.

CREATE VIEW sys.over26_2007 (personlopnr,peorglopnr,loneink,below26_loneink) AS (
SELECT personlopnr,peorglopnr,loneink, CASE WHEN fodelsear < 1981 THEN 0 ELSE loneink END AS below26_loneink
FROM sys.ds_chocker_lev_lisaindivid_2007
);
SELECT COUNT(*) FROM over26_2007;

CREATE VIEW sys.share26_2007 (peorglopnr,share26_2007) AS (
SELECT peorglopnr, SUM(below26_loneink)/NULLIF(SUM(loneink),0)
FROM sys.over26_2007
GROUP BY peorglopnr
);
SELECT COUNT(*) FROM share26_2007;

DECLARE firstq double;
SET firstq = (SELECT QUANTILE(share26_2007,0.25) FROM sys.share26_2007);
SELECT firstq;
DECLARE secondq double;
SET secondq = (SELECT QUANTILE(share26_2007,0.5) FROM sys.share26_2007);
SELECT secondq;
DECLARE thirdq double;
SET thirdq = (SELECT QUANTILE(share26_2007,0.275) FROM sys.share26_2007);
SELECT thirdq;
CREATE VIEW sys.share26cuts_2007 (peorglopnr,share26cuts_2007) AS (
SELECT peorglopnr, CASE WHEN share26_2007 < firstq THEN 1
                        WHEN share26_2007 < secondq THEN 2
                        WHEN share26_2007 < thirdq THEN 3
                        ELSE 4 END AS share26cuts_2007
FROM sys.share26_2007
);
SELECT COUNT(*) FROM share26cuts_2007;
László
  • 3,914
  • 8
  • 34
  • 49
0

About inspecting plans, MonetDB supports:

  • PLAN to see the logical plan
  • EXPLAIN to see the physical plan in terms of MAL instructions
  • TRACE same as EXPLAIN, but actually execute the MAL plan and return statistics for all instructions.

About your question on repeating the subqueries, in principle nothing will be repeated and you will not need to take care of it explicitly. That's because the default optimization pipeline includes the commonTerms optimizer. Your SQL will be translated to a sequence of MAL instructions, with duplicate calls. MAL is designed to be simple: many short instruction calls, a bit like assembly, which operate on columns, not rows (hence don't apply the same reasoning you would use for SQL Server when you think of execution efficiency). This makes it easier to run some optimizations on it. The commonTerms optimizer will detect the duplicate calls and reuse all results that it can. This is done per-column. So you should really be able to run your query and be happy.

However, I said in principle. Not all cases will be detected (though most will), plus some limitations have been introduced on purpose. For example, the search-space for detecting duplicates is a window (too small for my taste - I have removed it altogether in my installations) over the whole MAL plan: if the duplicate instruction is too far down the plan it won't be detected. This was done for efficiency. In your case, that single query isn't that big, but if it is part of a longer chain of views, then all these views will compile into a large MAL plan - which might make commonTerms less effective - it really depends on the actual queries.

cornuz
  • 2,678
  • 18
  • 35