13

Although a couple of questions were already posted in SO about the difference between Over Partition By and Group By, I did not find a definitive conclusion about which performs better.

I set up a simple scenario at SqlFiddle, where Over (Partition By) seems to boast a better execution plan (I am not much familiar with them, however).

Is the amount of data in the tables supposed to change this? Does Over (Partition By) then ultimately performs better?

enter image description here

enter image description here

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Veverke
  • 9,208
  • 4
  • 51
  • 95
  • I'm not sure what you're getting at. Sure, there are similarities between `partition by` and `group by`, but they're still two different operations with different use cases. And why do you consider the second execution plan to be better? Both have to do the same clustered index scans, but the second spends less relative time doing that, so it's obviously longer overall (about twice as long, just because of that distinct sort). – Luaan Sep 08 '15 at 09:08
  • 3
    There's no universal rule where one will perform better than the other. Set performance *goals*, then write simple code that solves the problem at hand. Then *measure* the performance. Only if the performance isn't acceptable should you then consider making changes. But if the query is *correct*, it's more likely that e.g. indexes will need changing than that changing between these two will make a difference. (If it *was* universally true that one outperformed the other and it was possible to mechanically transform queries between the two methods, the optimizer would already be doing that) – Damien_The_Unbeliever Sep 08 '15 at 09:39
  • @Luaan: From my point of view, a developer with 2 ways of doing something and getting the same result, it all boils down to: does one way perform better than the other ? If not, I will stick with the way more common Group By. A colleague just offered the select alternative with **over**, which I am happening to learn about now. – Veverke Sep 08 '15 at 09:39
  • @Damien_The_Unbeliever: That's an answer. Thanks – Veverke Sep 08 '15 at 09:42
  • The queries have different output so they perform different work. That said SQL Servers execution strategy for windowing functions is often not sophisticated and needlessly depends on the way the query is written. – usr Sep 08 '15 at 09:52
  • @usr: how do they have different outputs ? the output is 1-1, 2-100, 3-18 – Veverke Sep 08 '15 at 10:02
  • @Veverke the output depends on the data. The joins might bring in many rows per A-row. The group by eliminates those whereas in the other query the rows would be output. – usr Sep 08 '15 at 10:13
  • @usr: what about the `Distinct` in the 2nd ? – Veverke Sep 08 '15 at 10:18
  • 1
    @Veverke doesn't that apply after the aggregation? Different sums would make rows unique. That query is hard to comprehend. – usr Sep 08 '15 at 10:20
  • @usr: interesting insight, I honestly only thought about the distinct running prior to the window function. Now that should be another question :-). In any case, harder to comprehend or not, question still is... which will perform better. – Veverke Sep 08 '15 at 10:41
  • 1
    Well, my approach is a bit different - if there's two ways to do the same thing, I chose the one that's easier to read and understand. In this case, the `group by` clearly wins. Performance is only important when it actually makes a difference - it's a cost calculation like any other; there's a cost to writing performance-centric code (usually, losing maintainability) and a cost to writing maintenance-centric code (usually, losing performance). Unless there's a strong reason to prefer performance (usually only for a tiny part of the whole application), maintainability is the way to go. – Luaan Sep 08 '15 at 10:55
  • @Luaan: What you're saying makes perfect sense, but in the field of banking or insurance, for example, I don't think you will stick to such principle. Even if performance is not an issue now, one should take it into account as an *attempt* to avoid future problems. – Veverke Sep 08 '15 at 11:22
  • I doubt that "performance first" *as a general approach* makes sense *anywhere*. Even with e.g. spaceship computers, it's reliability (and predictability) first. You have to pick your battles - save your strength for where it can be used effectively. Banking isn't really special - in fact, I've seen systems for major banks that used *Java*, back in the day when it was interpreted (and indeed, very slow); in other cases, it was often something like COBOL, sometimes compiled, sometimes interpreted. The key has always been to find where performance is a primary concern, and where it isn't. – Luaan Sep 08 '15 at 11:32
  • That doesn't mean you should *ignore* performance concerns, of course. Thinking ahead is still absolutely crucial to have a maintainable application that actually works. But unless you're working on a very specific scenario (or your design is deeply flawed, I've been there too :D), this should only be a tiny portion of your application, not 90%. – Luaan Sep 08 '15 at 11:34
  • You didn't provide any data in the SQL Fiddle and I don't know why you think the second plan is better. I prefer `GROUP BY` in your case: easier to read, easier to understand, even if performance is *slightly* slower. It's not immediately clear to me what the `PARTITION BY` is doing. – Code Different Sep 08 '15 at 11:51
  • @ZoffDino: [Partition By in an Over clause](https://www.simple-talk.com/sql/learn-sql-server/window-functions-in-sql-server/) – Veverke Sep 08 '15 at 12:16
  • I know what the window functions do. They have their places but you are just aggregating numbers here. Why make it complicated? – Code Different Sep 08 '15 at 12:19
  • @ZoffDino: I learned about both over and **partition by** (what you asked for) in that article. Guys, stop losing focus on what was asked. The subject here is not "if I am free to complicate things, is it recommended to do so ?" - the question is clear: I want a performance conclusive statement about comparing both clauses performance, even if there is none. Damien_The_Unbeliever answers in this direction, while others entered different realms, such as "ideally, you should not do this or that". Let's be more precise here, after all we are computers is a somewhat a very mathematical field. – Veverke Sep 08 '15 at 13:02
  • @ZoffDino: If you missed the discussion above, this all started only after a colleague brought me the over alternative. I myself was happy with the Group By approach I proposed. Never we even entered the readability topic. – Veverke Sep 08 '15 at 13:03

1 Answers1

3

The execution plan for the windowed function is clearly inferior to the execution plan for the group by (the whole group by execution plan is included in the top right of the windowed function execution plan). SQL in general is better optimized for group by than using an over clause, however each has their uses. For example, if you wanted to output a row for each entry with the summation for the group, then windowing would probably make more sense (e.g. A.id|B.b1|sum B.b1 over A.id). Since you do not, it really doesn't. You're basically using a group by, then taking a distinct again, instead of just using the distinct that the group by implies.

Adam Martin
  • 1,188
  • 1
  • 11
  • 24
  • Did you mean *SQL in general is better optimized for group by* or over ? I mean you say the execution plan for over here is *clearly inferior*. – Veverke Sep 08 '15 at 13:53
  • SQL in general is better optimized for a group by versus an over, since an over includes additional information (hence your need for the distinct clause). You should really only use windowed functions when it is difficult to construct an efficient aggregate version of the query. – Adam Martin Sep 08 '15 at 14:17
  • Oh, when you say *inferior* you mean it is worse ? I thought you were using *inferior* denoting it costs less and thus is better... – Veverke Sep 08 '15 at 14:23
  • I used inferior to mean worse, yes – Adam Martin Sep 08 '15 at 14:36
  • What you say is what makes sense to me, but let's wait to see if more answers come. – Veverke Sep 08 '15 at 14:41
  • Upvoted for mentioning `You're basically using a group by, then taking a distinct again, instead of just using the distinct that the group by implies.`. I used `over partition by ...` for the same case. – andras Apr 03 '21 at 16:28