Fast SQL returning AVG for multiple column-condition pairs

Question

Let's suppose there's a table like this:

---------------------------------------------------------------------
| id | val_1 | val_2 | ... | val_n | attr_1 | attr_2 | ... | attr_n |
---------------------------------------------------------------------

val_i  : DOUBLE PRECISION
attr_i : INTEGER

I need to get every AVG( "val_i" ) where "attr_i" equals some value (the same for every "attr"). If there's no matches for some "attr_i" then AVG( "val_i" ) should be NULL, e.g.:

---------------------------------------
| val_1 | val_2 | val_3 | ... | val_n | 
---------------------------------------
| NULL  |  1.5  | NULL  | ... | 12.74 |
---------------------------------------

The following SQL seems to work as expected:

SELECT
  AVG( t1."val_1" ),
  AVG( t2."val_2" ),
...
  AVG( tn."val_n" )
FROM "test" t1
FULL OUTER JOIN "test" t2 ON ( t2."attr_2" = t1."attr_1")
FULL OUTER JOIN "test" t3 ON ( t3."attr_3" = t1."attr_1")
...
FULL OUTER JOIN "test" tn ON ( tn."attr_n" = t1."attr_1")
WHERE
     ( t1."attr_1" = some_value )
  OR ( t2."attr_2" = some_value )
...
  OR ( tn."attr_n" = some_value )
;

But it's too slow. For i == 4 and record count == 100 I had to interrupt operation after ~40 min.

So is there a faster way? The best would be to have single SQL (not a stored procedure) returning single row. It should take a few seconds maximum for i == 6 and record count > 1000.

Is `SELECT AVG(IIF(t1."attr_i" = some_value, t1."val_i", NULL)),... FROM "test" t1` what you are looking for? — BrakNicku, Nov 25 '19 at 07:52
You have to re-do the table. "Wide tables" like this are not only against Database Normalization rules, as you just learned trying to do so naturally looking averages, they do not scale up well, especially on multi-version architecture of IB/FB family. When you update such a table changing ONE single column, Firebird has to write onto disk the WHOLE row. And when you `select` one single column, Firebird again has to read from disk the WHOLE row. So, better of all rework this turned side down table into classic id/val/attr scheme with id/attr being Primary Key — Arioch 'The, Nov 25 '19 at 09:21
Then just do `select id, attr, avg(val) from ... group by 1,2` - simple. If some of your legacy software would be dependent upon this pivot layout of data - you may create a `VIEW` simulating the "wide" design of the table. But notice that once you maybe will add more attr types in future - you may just run out of columns! — Arioch 'The, Nov 25 '19 at 09:23
@BrakNicku what if that "some_value" would in some rows be met in "attr_k" instead of "attr_i" ? — Arioch 'The, Nov 25 '19 at 09:26
@Arioch'The as I undestand the question: value in n-th `val` column should be included in n-th average if n-th `attr` meets condition. So to answer your question - using the method I suggested that row would be included in k-th average and not in i-th. — BrakNicku, Nov 25 '19 at 10:01
@Arioch'The not sure what you mean, select from single table with n (6 in OP) simple expressions like the one in my comment, no combinations at all... — BrakNicku, Nov 25 '19 at 11:40
@BrakNicku it is already 6 selects instead of one, now assume each of those 6 columns may on average had N different interesting values, and you already have 6xN combinations. — Arioch 'The, Nov 25 '19 at 14:10
@Arioch'The one select, six columns (exact copies of the first comment with i=1,2,3,4,5,6), no joins (but full table scan), one parameter (interesting value). I wonder if our understanding of this Q is the same (but OP seemed to confim I got it right). — BrakNicku, Nov 25 '19 at 14:30
@BrakNicku TS confirms your grasping down there under GL's answer — Arioch 'The, Nov 25 '19 at 16:35

Gordon Linoff · Answer 1 · 2019-11-25T13:37:40.150

1

Your issue is that the same attribute can be in different columns, then you have an issue with your data model. You can work around this by unpivoting and aggregating:

select attr, avg(val)
from ((select t.id, t.attr_1 as attr, t.val_1 as val
       from t
      ) union all
      (select t.id, t.attr_2 as attr, t.val_n as val
       from t
      ) union all
      . . .
      (select t.id, t.attr_n as attr, t.val_n as val
       from t
      )
     ) t
group by attr;

In this case the result set is in rows, not columns.

You can filter either in the outer where or in the subqueries. Filtering in the subqueries might have somewhat better performance -- I don't know enough about Firebird to know if filtering conditions are "pushed down" into the union all subqueries.

edited Nov 25 '19 at 13:37

answered Nov 23 '19 at 17:51

Gordon Linoff

1,242,037
58
646
786

It doesn't work as I expect. It combines all `val_i` into one column and returns its `AVG`, whereas I need separate `AVGs` for each `val_i`. – Old Skull Nov 25 '19 at 05:18
@OldSkull add `id` to all those selects above – Arioch 'The Nov 25 '19 at 09:17
@Arioch'The, that "splits" resultt by `id`, but still combines all `val` into one column – Old Skull Nov 25 '19 at 13:36
@OldSkull . . .The averages are on *separate rows*. The answer explains that -- or at least it tries to. – Gordon Linoff Nov 25 '19 at 13:38
@GordonLinoff, `id` is not a primary key/autoinc. And it may repeat. So e.g. if I have 5 different `id` and 4 `val` columns, I will get 5 separate rows. How can I extract my 4 averages? – Old Skull Nov 25 '19 at 13:46
@OldSkull if `i` is somehow meaningful for you, then you can ass `i` column into a refactored "short" table and grouping by i+attr pair, would be the best DB structure, and if you insist on wide table then BrakNiku's suggestion should work for a while. However you really better add sample data and desired output for it into the question, to showcase your description with specific figures. As of now we are all just doing guesswork over what you meant – Arioch 'The Nov 25 '19 at 14:19
@OldSkull . . . I repeat that your sample query does not involve `id` at all. But you can aggregate by `id` in the outer query in this answer. – Gordon Linoff Nov 25 '19 at 15:08
@Arioch'The, `i` means _i_-th index. If there're columns `val_1`, `val_2`, ... , `val_N`, then `val_i` is just a _i_-th column. The goal is to get N separate `AVG` where _i_-th `val` depends only on _i_-th `attr`. @BrakNicku got me right. But if you'd like to see real data/results, I'll try to add samples tomorrow. – Old Skull Nov 25 '19 at 16:16
@OldSkull and there seems not much difference between pairs of val/attr column, pair #1 and #2 are like one another, so are pairs #2 and #3, etc. They seem to represent slightly different manifestations of the same idea. And thus in SQL model they better fit into rows. In SQL it is hard to add/delete columns and easy to add/delete rows. See Normal Forms. Make table `[ id / i / val / attr ]` and you had transposed your 6 column pairs into 6 rows. Now you can apply `SELECT AVG(val), i, attr ...GROUP BY i, attr` naturally, without all those `IIF` gimmicks. And many other queries too. – Arioch 'The Nov 25 '19 at 16:34
@GordonLinoff it seems he was to ignore ID but to group values per-columns-pair (1-2-3-4-5-6) and per attr-# column values. Now I wonder if the same value can be met in different columns, like maybe `attr_1` in row #17 might be equal to `attr_5` in row #42 – Arioch 'The Nov 25 '19 at 16:40

Fast SQL returning AVG for multiple column-condition pairs

1 Answers1