0

****EDIT**** Adding SQL Fiddle Link HERE I created the scenario in SQL Fiddle to show what I am trying to accomplish. I found it very odd that it produces accurate results in SQL Fiddle yet produces the results below using my DB.

However for the sake of proving that the seasons truly exist here is a select tvseasons join on tvseries: TVSeason_Join_TVSeries

Running this query:

SELECT TVSeriesHasTVSeason.tvSeriesID, TVSeason.tvSeasonID, TVSeason.title, Users.userID, 
    CASE
        WHEN UserHasWatchedTVSeason.tvSeasonID IS NULL THEN 'No'
        ELSE 'Yes'
    END as watched
FROM TVSeason
CROSS JOIN Users
LEFT JOIN UserHasWatchedTVSeason
    ON  TVSeason.tvSeasonID = UserHasWatchedTVSeason.tvSeasonID
    AND Users.userID = UserHasWatchedTVSeason.userID
RIGHT JOIN TVSeriesHasTVSeason
    ON TVSeason.tvSeasonID = TVSeriesHasTVSeason.tvSeasonID 

Returns:

515 1769    1000 Ways to Die Season 1   3   Yes
515 1770    1000 Ways to Die Season 2   3   Yes
515 1772    1000 Ways to Die Season 4   3   Yes
515 1773    1000 Ways to Die Season 5   3   Yes
516 1774    2 Stupid Dogs Season 1      3   No
516 1775    2 Stupid Dogs Season 2      3   No
517 1777    24 Season 2                 3   Yes
517 1779    24 Season 4                 3   Yes
517 1780    24 Season 5                 3   Yes
517 1781    24 Season 6                 3   Yes
517 1782    24 Season 7                 3   Yes

The season id are consecutive you can clearly see season 3 of 1000 ways to die is not being returned and there are also a couple seasons of 24 also not being returned.

I have no idea what is wrong with this query?

****EDIT** I believe I found a working solution to the issue:**

SELECT x.*,
    CASE
        WHEN UserHasWatchedTVSeason.tvSeasonID IS NULL THEN 'No'
        ELSE 'Yes'
    END as watched
FROM
(SELECT 
    TVSeries.tvSeriesID, TVSeries.title,
    TVSeriesHasTVSeason.tvSeasonID,
    Users.userID
FROM TVSeries
LEFT JOIN TVSeriesHasTVSeason
    on tvseries.tvSeriesID = TVSeriesHasTVSeason.tvSeriesID
LEFT JOIN TVSeason
    on tvseason.tvSeasonID = TVSeriesHasTVSeason.tvSeriesID
CROSS JOIN Users)x
LEFT JOIN UserHasWatchedTVSeason
    on x.tvSeasonID = UserHasWatchedTVSeason.tvSeasonID
    AND x.userID = UserHasWatchedTVSeason.userID
Kairan
  • 5,342
  • 27
  • 65
  • 104
  • What is the input data? What are you trying to do? – Mosty Mostacho Oct 31 '13 at 05:01
  • @MostyMostacho Trying to get list of all Tv Series, each season in the series, and see if a specific user has watched each season ... – Kairan Nov 01 '13 at 00:18
  • Not sure what you mean by input data – Kairan Nov 01 '13 at 00:25
  • You have input data, run a SQL statement and that produces an output you're not interested in. You pasted that *wrong* output and the *wrong* query but you didn't provide the input data that generates the output. – Mosty Mostacho Nov 01 '13 at 00:27

2 Answers2

1

Assuming that all seasons are in TVSeason, the most plausible explanation would be that the seasons you mention are missing from TVSeriesHasTVSeason. Note that the right join does nothing but eliminate rows missing from TVSeriesHasTVSeason, as no data from that table is used anywhere else. By the way, note that you don't need the cross join. Assuming your tables are sane you can take the user IDs from UserHasWatchedTVSeason.


An update, based on the comments and on the edits to the question. In the discussion below the line, you said:

I guess I am looking to do this in a View that shows the Series # and TV Season # and User ID # and watched = yes, no, or partially watched series

Below is a query that, given sane data, would match your requirement:

SELECT WatchCount.tvSeriesID, WatchCount.userID,
  CASE
    WHEN WatchCount.NWatched = 0 THEN 'No'
    WHEN WatchCount.NWatched = SeasonCount.NSeasons THEN 'Yes'
    ELSE 'Partial'
    END AS Watched
FROM (
  SELECT SR.tvSeriesID, U.userID,
    COUNT(UxSN.tvSeasonID) AS NWatched
  FROM TVSeries SR
  CROSS JOIN Users U
  LEFT JOIN TVSeriesHasTVSeason SRxSN
    ON SRxSN.tvSeriesID = SR.tvSeriesID
  LEFT JOIN UserHasWatchedTVSeason UxSN
    ON UxSN.userID = U.userID
    AND UxSN.tvSeasonID = SRxSN.tvSeasonID
  GROUP BY SR.tvSeriesID, U.userID
  ) WatchCount
INNER JOIN (
  SELECT SRxSN.tvSeriesID,
    COUNT(SRxSN.tvSeasonID) AS NSeasons
  FROM TVSeriesHasTVSeason SRxSN
  GROUP BY SRxSN.tvSeriesID
  ) SeasonCount
ON SeasonCount.tvSeriesID = WatchCount.tvSeriesID

A few important observations:

  • Your comment mentioned returning both the series and the season IDs along with the Watched field. That, however, wouldn't work well: Watched is a property of the user-series combination; and so a query returning it should have the season data grouped away already (the alternative leads to returning a lot of duplicated data).

  • The evidence you provided strongly suggests that the TVSeriesHasTVSeason table has missing or wonky rows for a few seasons. This query does not account for that; therefore, you will likely need an extra left join on TVSeasons (as in your answer) or, preferably, to check your data and figure out what is wrong with TVSeriesHasTVSeason.

  • TVSeriesHasTVSeason seems unnecessary; if the schema is under your control I suggest you to merge it with TVSeason. Every season has exactly one series, and so the association would be more naturally done through an extra foreign key in TVSeason. Separate association tables are best used with many-to-many relations, or with optional fields.

  • While there is a cross join in this query, it is between TVSeries and Users, which should result in far fewer result rows than one between TVSeason and Users. Looking from a higher-level point of view, the cross join between TVSeries and Users expresses your desired result (i.e. all combinations between series and seasons), while a cross join between TVSeason and Users generates a lot of extra data (the individual 'Yes' and No values) which will be thrown away (as you are only interested in the counts).

duplode
  • 33,731
  • 7
  • 79
  • 150
  • Tried changing it to a left join still has the issue... i seem to lose rows once I add the LEFT JOIN UserHasWatchedTVSeason ON TVSeason.tvSeasonID = UserHasWatchedTVSeason.tvSeasonID AND Users.userID = UserHasWatchedTVSeason.userID – Kairan Nov 01 '13 at 00:23
  • By that you mean the problem remains if you *remove* the right join, leaving only the cross join and the left join on `UserHasWatchedTVSeason`? Also, note that the cross join is not necessary - see the edited answer. – duplode Nov 01 '13 at 02:40
  • If there are 500 tv seasons and 2 users i should return 1000 records, obviously it will have nulls in the areas where a user has not watched a tv season. If I use your example solution I am getting 500 + x, where x is the number of season the 2nd user has watched – Kairan Nov 05 '13 at 01:12
  • @Kairan Indeed, there is a difference - however, it might play to your advantage. Cross joins always raise suspicion: if you had 1000 seasons and 1000 users, the resulting table would have a million rows; such a large output is likely unnecessary. If you change the left join in my query to an inner join, the result will only contain the user-season pairs for which the user has watched the season; and likely that will be enough for your further analysis and aggregation (to find out whether an user has watched a season, just see if the combination is in the output). – duplode Nov 05 '13 at 01:52
  • @Kairan By the way, did removing the right join solve the original issue? – duplode Nov 05 '13 at 01:53
  • It is easy to find out which user watched a Season, its harder to find out which user watched a "series" = yes, no, or partially as this information isn't explicitly stored anywhere. I have to rely on the user/tvseries/tvseason relationships to determine if a particular user has watched a series yes, no, partial. I plan to create a View to store the series / user / watched – Kairan Nov 05 '13 at 02:01
  • @Kairan not necessarily. You can use `GROUP BY` and `COUNT` to find out, for instance, how many seasons of a series each user has watched (from the output of my latest suggestion) and how many seasons there are for each series (from the `TVSeason` table). – duplode Nov 05 '13 at 02:08
  • I guess I am looking to do this in a View that shows the Series # and TV Season # and User ID # and watched = yes, no, or partially watched series – Kairan Nov 05 '13 at 02:20
  • I added a solution, Im going to +1 for your effort. Perhaps you might take a look at my solution and tell me why you think it is working whe my original solution did not work. This might help others that view the solution. – Kairan Nov 05 '13 at 03:20
  • @Kairan I expanded the answer to cover the issues you raised. – duplode Nov 05 '13 at 04:17
0

This is a working solution:

SELECT x.*,
    CASE
        WHEN UserHasWatchedTVSeason.tvSeasonID IS NULL THEN 'No'
        ELSE 'Yes'
    END as watched
FROM
    (SELECT 
        TVSeries.tvSeriesID, TVSeries.title,
        TVSeriesHasTVSeason.tvSeasonID,
        Users.userID
    FROM TVSeries
    LEFT JOIN TVSeriesHasTVSeason
        on tvseries.tvSeriesID = TVSeriesHasTVSeason.tvSeriesID
    LEFT JOIN TVSeason
        on tvseason.tvSeasonID = TVSeriesHasTVSeason.tvSeriesID
    CROSS JOIN Users)x
LEFT JOIN UserHasWatchedTVSeason
    on x.tvSeasonID = UserHasWatchedTVSeason.tvSeasonID
    AND x.userID = UserHasWatchedTVSeason.userID

My thought is that in my original post that I was losing my connection in the CROSS JOIN when I referenced it in later JOINS.

I would hope someone might be able to tell me WHY this worked exactly as it is still a little unclear to me.

Also to expand my answer to return 'yes', 'no', 'partially watched':

SELECT *
FROM
    (SELECT userID, tvSeriesID, 
        CASE
            WHEN COUNT(tvSeriesID) = ABS(SUM(watched)) 
                AND SUM(watched) > 0 THEN 'Yes'
            WHEN COUNT(tvSeriesID) = ABS(SUM(watched)) 
                AND SUM(watched) < 0 THEN 'No'
            ELSE 'Partial'
        END as watched
    FROM
        (SELECT x.*,
            CASE
                WHEN UserHasWatchedTVSeason.tvSeasonID IS NULL THEN -1
                ELSE 1
            END as watched
        FROM
            (SELECT 
                TVSeries.tvSeriesID, TVSeries.title as tvSeriesTitle,
                TVSeriesHasTVSeason.tvSeasonID, 
                Users.userID
            FROM TVSeries
            LEFT JOIN TVSeriesHasTVSeason
                on tvseries.tvSeriesID = TVSeriesHasTVSeason.tvSeriesID
            LEFT JOIN TVSeason
                on tvseason.tvSeasonID = TVSeriesHasTVSeason.tvSeriesID
            CROSS JOIN Users
            )x
        LEFT JOIN UserHasWatchedTVSeason
            on x.tvSeasonID = UserHasWatchedTVSeason.tvSeasonID
            AND x.userID = UserHasWatchedTVSeason.userID
        )y
    GROUP BY userID, tvSeriesID
    )z
ORDER BY userID, tvSeriesID
Kairan
  • 5,342
  • 27
  • 65
  • 104
  • The main difference I see is that you changed the right join of `TVSeriesHasTVSeason` to a left join. That fits my suspicion that there are missing seasons in `TVSeriesHasTVSeason` (but not on `TVSeason`), making the right join remove the corresponding rows. By the way, I am going to expand my answer based on our previous discussion. – duplode Nov 05 '13 at 03:45