4

Question similar to this one: MySQL: View with Subquery in the FROM Clause Limitation

I have the following shows table:

DROP TABLE IF EXISTS `shows`;
CREATE TABLE `shows` (
  `show_id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `show_type` int(11) unsigned DEFAULT NULL,
  `show_year` int(11) unsigned DEFAULT NULL,
  PRIMARY KEY (`show_id`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
INSERT INTO `shows` VALUES
(NULL, 1, 2014), -- 1
(NULL, 1, 2015), -- 2
(NULL, 2, 2015), -- 3
(NULL, 2, 2014); -- 4

I want to create a VIEW that will return show_id for the highest show_year for each show_type. Here's a nested query that works - returns 2 and 3:

SELECT s.show_id, s.show_year
FROM ( -- subquery for applying ORDER BY before GROUP BY
    SELECT *
    FROM shows
    ORDER BY show_year DESC
) s
GROUP BY show_type;
/*
+---------+-----------+
| show_id | show_year |
+---------+-----------+
|       2 |      2015 |
|       3 |      2015 |
+---------+-----------+
*/

Just for the reference I also tried the following query that seemed natural to me at first, but it ended up being no good in my case as shown below:

SELECT s.show_id, MAX(s.show_year)
FROM shows s
GROUP BY show_type;
/*
+---------+------------------+
| show_id | MAX(s.show_year) |
+---------+------------------+
|       1 |             2015 | <== show_id=1 does NOT have show_year=2015
|       3 |             2015 |
+---------+------------------+
*/

Now creating a VIEW - based on the nested query above (the first SELECT) the problem is that a view will not accept a subquery.

So I am using two views instead. One inside of another.

The first one just sorts the table by show_year DESC:

CREATE ALGORITHM=UNDEFINED DEFINER=`root`@`%` SQL SECURITY DEFINER VIEW `shows_desc` AS
SELECT `s1`.`show_id` AS `show_id`,
    `s1`.`show_type` AS `show_type`,
    `s1`.`show_year` AS `show_year`
FROM `shows` `s1`
ORDER BY `s1`.`show_year` DESC;

The second one is supposed to do the GROUP BY on the first one:

CREATE ALGORITHM=UNDEFINED DEFINER=`root`@`%` SQL SECURITY DEFINER VIEW `shows_grouped` AS
SELECT `s2`.`show_id` AS `show_id`,
    `s2`.`show_year` AS `show_year`
FROM `shows_desc` `s2`
GROUP BY `s2`.`show_type`;

However, to my surprise it returns incorrect rows. As if it GROUPed ignoring the ORDER of the subview:

+---------+-----------+
| show_id | show_year |
+---------+-----------+
|       3 |      2015 |
|       1 |      2014 | <== why?
+---------+-----------+

Clues?

P.S.: SQL fiddle to poke: http://sqlfiddle.com/#!2/e506d4/5

Community
  • 1
  • 1
Geo
  • 12,666
  • 4
  • 40
  • 55
  • Re your two views: Why do you think the GROUP BY type of the ORDER BY year only returns rows with maximum year per type? Why do you think the sort order of a table or view mentioned in a query is *not* ignored? – philipxy Nov 16 '14 at 09:49
  • E.g. I have 3 shows: 2012, 2013, and 2014. First ORDER BY DESC will bring the 2014 record on the top. Then GROUP BY will get rid of 2012 and 2013. At least this is what I see in the results of the nested query (but not in the results of the nested views) – Geo Nov 17 '14 at 14:48
  • This answer to "why" is too vague to show your errors. The GROUP BY does *not* "get rid of" any particular years. Gordon Linoff's answer explains that your GROUP BY's SELECT does not return any particular year value; it does not do what you (presumably) think it does; it is wrong; it just happens to give the right answer this time for this data. Also, put in your question what result you want when a type has two ids with the same high year. – philipxy Nov 17 '14 at 22:44
  • Correct, it should end up with just two ids, however not necessarily with "the same high year". In fact, based on this particular dataset it should be two different years. In other words, the result should have as many records as there are distinct `show_type` values (hence GROUP BY). In my example - there are two shows of each type in each year. I want to end up with 2 records, each representing the *latest* show per `show_type`. Please let me know if this is still unclear, or if a differnt dataset can help. Thank you very much for your help! Let me know if we should open a chat session? – Geo Nov 18 '14 at 21:11
  • The "two ids with the same high year" in my comment are referring to possible *input*, where a single type has them both. And like Gordon & I said TWO IDS OF ONE TYPE CAN HAVE THE SAME (HENCE, "LATEST"?) YEAR (JUST AS IN YOUR EXAMPLE!) SO IF YOU WANT JUST ONE OF TWO SUCH IDS OUTPUT THEN EXPLAIN WHICH ONE IN YOUR QUESTION. (Maybe "latest" means "with the largest id"?--you have NOT BEEN CLEAR.) You do not seem to be reading very carefully what we are telling & asking you. And you are not writing clearly. You don't need chat (yet), just read carefully and respond carefully. – philipxy Nov 19 '14 at 01:49
  • oh sorry, by "latest" I meant "highest year". What's up with CAPS lol?? – Geo Nov 20 '14 at 20:00
  • My use of SHOUT CASE was just trying to get your attention since your comments still didn't show understanding after Gordon & I had told you multiple times. AND STILL DON'T show understanding. Read our messages and comments again. Within one type there can be MULTIPLE IDS with that type and its maximum/latest year. EDIT INTO YOUR QUESTION that IN THAT CASE YOU (apparently) ONLY WANT ONE ID and WHICH ONE. Ie what unstated rule did you use to return id 3 and only id 3 for type 2 when id 4 also has the highest year 2014 for that type? – philipxy Nov 21 '14 at 00:55
  • guys sorry, just added an edit: the fourth record for this example meant actually to be `(NULL, 2, 2014); -- 4` – Geo Nov 21 '14 at 17:41
  • But what if a type has two ids with the same max year. – philipxy Nov 21 '14 at 18:14
  • good question @philipxy - then it should still show one (any) row. So GROUP BY is needed. However, most important is to ensure the row's *integrity*, i.e. values from all columns must belong to the **same** record. – Geo Nov 21 '14 at 18:30
  • You cannot get just one id per type & year just selecting from GROUP BY. (As I guess you realized facepalming below.) That is why Gordon & I found the maximum id of a subquery for rows with a given type and max year. One could also use ORDER BY and LIMIT 1 on that subquery for an arbitrary id. Re NULL: Then show_id is not NOT NULL and not PRIMARY KEY and show_ids can't be generated via VALUES NULL. – philipxy Nov 21 '14 at 19:27

3 Answers3

4

Lew me focus on this:

I want to create a VIEW that will return show_id for the highest show_year for each show_type. Here's a nested query that works - returns 2 and 3:

Here is one way, assuming that the show_id increments so the largest show_id is in the latest year:

select show_type, max(show_year) as show_year, max(show_id)
from shows
group by show_type;

If not, try this:

select show_type, max(show_year) as show_year,
       substring_index(group_concat(show_id order by show_year desc), ',', 1) as show_id
from shows
group by show_type;

The reason your other queries don't work is because you have "wishful" understanding of how MySQL works. You are using an extension to MySQL that is explicitly said not to work in the documentation. That is, the columns in the select are not in aggregation functions and are not in the group by clause (and are not functionally dependent, but that is a very advanced concept that isn't relevant for this discussion):

In this case, the server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate, which is probably not what you want.

EDIT:

The following should also work for a view:

select s.*
from shows s
where s.show_year = (select max(s2.show_year) from shows s2 where s2.show_type = s.show_type);

EDIT II:

If you want only one row for each show_type and show_id is unique, then this should work:

select s.*
from shows
where not exists (select 1
                  from shows s2
                  where s2.show_type = s.show_type and
                        (s2.show_year > s.show_year or
                         s2.show_year = s.show_year and s2.show_id > s.show_id
                        )
                 );
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Thank you Gordon. `show_id` increments unreliably. But even if it was reliable, what if I had more columns in that table? MAX each? Wouldn't work. The second solution I don't quite understand, but still it returns 2 and 4 (instead of desired 2 and 3). Besides, somehow those two results seem to be presented as BLOBs and in HEX look like 32 and 34 :-/ – Geo Nov 06 '14 at 19:27
  • Re:EDIT: I was [to my surprise] able to create a view based on this *nested* query, however: a) It returns 3 show_ids: 2, 3, and 4 instead of just 2 and 3; b) It fails to return other years (e.g. if you added another record into the original shows table as show_type=3 and show_year=2014) – Geo Nov 06 '14 at 21:04
  • @Geo . . . It returns *three* show rows because there are ties for one of the show types. Hence, there are *three* the fit your criteria. – Gordon Linoff Nov 06 '14 at 21:25
  • Did I state the question unclearly maybe? All I need is a single row (all data associated with that particular show_id record) for EACH show_type. There are 2 show_types. Thus I am expecting only 2 rows in the end. – Geo Nov 18 '14 at 19:36
  • 1
    @Geo . . . Yes, your question is unclear. This query shows the rows that have the maximum show year for each show. What do you want when there is only one show? – Gordon Linoff Nov 18 '14 at 20:12
  • Ok, let me try restating it. Look at my first query - `SELECT FROM (SELECT ORDER DESC) GROUP`. What I am trying to do is **#1** sort all the rows by `show_year` BACKWARDS. This makes sure that the most recent shows (the ones I am interested in) are on the top. Then I run **#2** GROUP which only picks the FIRST RECORD for each `show_type`, skipping all the other show_types with a duplicate `show_year`. Should we open a chat channel? – Geo Nov 18 '14 at 21:04
  • 1
    @Geo . . . You should understand how `group by` works. It does not choose the "first" rows for values. The lengthy description is here (http://dev.mysql.com/doc/refman/5.7/en/group-by-handling.html), but focus on the word "indeterminate". That is very different from the first value. – Gordon Linoff Nov 18 '14 at 23:22
  • look at the first comment "Posted by Kasey Speakman on October 3 2009 10:49pm" on this page http://dev.mysql.com/doc/refman/5.7/en/example-maximum-column-group-row.html – Geo Nov 20 '14 at 20:53
  • @Geo . . . I trust the documentation itself more on what is and is not allowed. The documentation is rather explicit on the use of "indeterminate". That comment is a very poor choice to put on the documentation page. – Gordon Linoff Nov 20 '14 at 21:25
  • @Geo Re "I was [to my surprise] able to create a view based on this nested query": Your own link says that what is disallowed is *"a FROM clause that contains a subquery"*. – philipxy Nov 21 '14 at 04:44
  • @philipxy . . . Subqueries are not allowed in the `from` clause. No queries in my answer have subqueries in the `from` clause. They are in the `where` clause. – Gordon Linoff Nov 21 '14 at 12:26
  • This is why `GROUP BY` is a bad way to do business. Don't expect it to do your sorting for you. That's what `ORDER BY` is for. – vapcguy Jul 20 '16 at 14:20
1

This query worked as a VIEW for me. Hope it helps anyone.

SELECT s3.*
FROM shows s3
WHERE s3.show_id IN (
  SELECT s1.show_id
  FROM shows s1
      LEFT JOIN shows s2
      ON s1.show_type = s2.show_type
      AND s2.show_year > s1.show_year
  WHERE s2.show_id IS NULL
  GROUP BY s1.show_type
);
/*
+---------+-----------+-----------+
| show_id | show_type | show_year |
+---------+-----------+-----------+
|       2 |         1 |      2015 |
|       3 |         2 |      2015 |
+---------+-----------+-----------+
*/

WHERE IS NULL - means there is no record that has show_year HIGHER than this one for each show_type

The whole schema and each query is shown here: http://sqlfiddle.com/#!2/f28510/3

Geo
  • 12,666
  • 4
  • 40
  • 55
  • ah! \*facepalm\* I think you are right. This one would only work if there are no shows of the same type (though with different IDs) for the same year. – Geo Nov 21 '14 at 18:34
  • @philipxy EDIT: how about adding a SELECT WHERE ID IN around? In this case I get solid rows based on the returned IDs (the use of `MAX` here is optional). Right? – Geo Nov 21 '14 at 21:37
  • I can barely make sense of that comment but: Your FROM...WHERE is the subquery I mentioned in a recent comment to your question "for rows with a given type and max year". Yes, for indeterminate id per type and max year instead of nesting it in GROUP BY with MAX you can nest it in GROUP BY with SELECT of id via MySQL non-standard mode-dependent select of an indeterminate value from non-grouped non-aggregated column. (Better to use non-standard mode-independent pseudo-aggregate ANY VALUE().) PS Using MAX the result table ids should now be 2 & 4. – philipxy Nov 21 '14 at 23:08
  • @philipxy it works great!! Added an sqlfiddle. Thanks! – Geo Nov 23 '14 at 23:16
0

I want to create a VIEW that will return show_id for the highest show_year for each show_type.

select s.show_id
from shows s
where s.show_year=
    (select max(st.show_year)
    from shows st
    where st.show_type=s.show_type)

(As in Gordon Linoff's answer.)

In general there can be many show_ids with the maximum show_year of a given show_type. (As he also pointed out in a comment.)

If you want to return just one of those, say so and say which one in your question. For the maximum one:

select s.show_id
from shows s
where s.show_year=
    (select max(st.show_year)
    from shows st
    where st.show_type=s.show_type)
and s.show_id=
    (select max(st.show_id)
    from shows st
    where st.show_type=s.show_type
    and st.show_year=s.show_year);

Using nested views for the ids with maximum year per type:

CREATE VIEW `shows_1` AS
SELECT `show_type` AS `show_type`,
    MAX(`show_year`) AS `show_year`
FROM `shows`
GROUP BY `show_type`;

CREATE VIEW `shows_ids` AS
SELECT `s`.`show_id`
FROM `shows` `s`
JOIN `shows_1` `s1`
ON `s`.`show_type`=`s1`.`show_type`
AND `s`.`show_year`=`s1`.`show_year`;

Or if you want the maximum show_id per show_type and maximum show_year:

CREATE VIEW `shows_id` AS
SELECT MAX(`s`.`show_id`) AS `show_id`
FROM `shows` `s`
JOIN `shows_1` `s1`
ON `s`.`show_type`=`s1`.`show_type`
AND `s`.`show_year`=`s1`.`show_year`
GROUP BY `s`.`show_type`,`s`.`show_year`;

So how can ORDER BY be applied before GROUP BY in a VIEW?

If you want to order the result of a SELECT then you must do that in its ORDER BY after its GROUP BY. So to ORDER BY before you GROUP BY you must use an outer SELECT with a GROUP BY. If he outer GROUP BY SELECT is in a view then the innner ORDER BY SELECT may have to be in another view named in the first.

But the order of a table or view named in a query is not kept through calculating its result. So why do you want to ORDER BY before GROUP BY? Why do you think your question's nested views should return maximum show_year per show_type?

philipxy
  • 14,867
  • 6
  • 39
  • 83