13

I am learning SQL trough a GalaXQL tutorial.

I can't figure out the following question (Exercise 12):

Generate a list of stars with star ids below 100 with columns "starname", "startemp", "planetname", and "planettemp". The list should have all stars, with the unknown data filled out with NULL. These values are, as usual, fictional. Calculate the temperature for a star with ((class+7)*intensity)*1000000, and a planet's temperature is calculated from the star's temperature minus 50 times orbit distance.

What is the syntax to write a LEFT OUTER JOIN query when you have sub-query items "AS" that you need to join together?

Here is what I have:

SELECT stars.name AS starname, startemp, planets.name AS planetname, planettemp 
FROM stars, planets 
LEFT OUTER JOIN (SELECT ((stars.class + 7) * stars.intensity) * 1000000 AS startemp 
                 FROM stars) 
             ON stars.starid < 100 = planets.planetid 
LEFT OUTER JOIN (SELECT (startemp - 50 * planets.orbitdistance) AS planettemp 
                 FROM planets) 
             ON stars.starid < 100

Here is the database schema (sorry, cant post the image file due to low rep):

CREATE TABLE stars (starid INTEGER PRIMARY KEY,
                    name TEXT,
                    x DOUBLE NOT NULL,
                    y DOUBLE NOT NULL,
                    z DOUBLE NOT NULL,
                    class INTEGER NOT NULL,
                    intensity DOUBLE NOT NULL);

CREATE TABLE hilight (starid INTEGER UNIQUE);

CREATE TABLE planets (planetid INTEGER PRIMARY KEY,
                      starid INTEGER NOT NULL,
                      orbitdistance DOUBLE NOT NULL,
                      name TEXT,
                      color INTEGER NOT NULL,
                      radius DOUBLE NOT NULL);

CREATE TABLE moons (moonid INTEGER PRIMARY KEY,
                    planetid INTEGER NOT NULL,
                    orbitdistance DOUBLE NOT NULL,
                    name TEXT,
                    color INTEGER NOT NULL,
                    radius DOUBLE NOT NULL);

CREATE INDEX planets_starid ON planets (starid);
CREATE INDEX moons_planetid ON moons (planetid);
Daniel Heilper
  • 1,182
  • 2
  • 17
  • 34
verkter
  • 758
  • 4
  • 15
  • 29
  • Actually, we'd prefer that it _wasn't_ an image, in preference of text; that makes it much easier to dump it into tools for running our own tests. Among other things, you appear to have a syntax error: `stars.starid < 100 = planets.planetid` (if the db is accepting it, I can guarantee you're not getting correct results). In the future, we'd appreciate a better description of what, specifically, isn't working, and what you _should_ be getting ("given this starting data, my query result should look like this..."). This is better written than some initial questions I've seen, though... – Clockwork-Muse May 25 '14 at 05:55
  • Great, will keep this in mind. Thank you. – verkter May 25 '14 at 05:56
  • "Here is the database schema (sorry, cant **post the image file** due to low rep)": You could post it now, ex post :-) – nutty about natty Sep 13 '19 at 07:29

3 Answers3

20

Lets build this up slowly.

First, lets see about getting just the information about stars:

SELECT name AS starName, (class + 7) * intensity * 1000000 AS starTemp 
FROM Stars
WHERE starId < 100

(this should look might familiar!)
We get a list of all stars whose starId is less than 100 (the WHERE clause), grabbing the name and calculating temperature. At this point, we don't need a disambiguating reference to source.

Next, we need to add planet information. What about an INNER JOIN (note that the actual keyword INNER is optional)?

SELECT Stars.name as starName, (Stars.class + 7) * Stars.intensity * 1000000 AS starTemp,
       Planets.name as planetName
FROM Stars
INNER JOIN Planets
        ON Planets.starId = Stars.starId
WHERE Stars.starId < 100

The ON clause is using an = (equals) condition to link planets to the star they orbit; otherwise, we'd be saying they were orbiting more than one star, which is very unusual! Each star is listed once for every planet it has, but that's expected.

...Except now we have a problem: Some of our stars from the first query disappeared! The (INNER) JOIN is causing only stars with at least one planet to be reported. But we still need to report stars without any planets! So what about a LEFT (OUTER) JOIN?

SELECT Stars.name as starName, (Stars.class + 7) * Stars.intensity * 1000000 AS starTemp,
       Planets.name as planetName
FROM Stars
LEFT JOIN Planets
       ON Planets.starId = Stars.starId
WHERE Stars.starId < 100

... And we have all the stars back, with planetName being null (and only appearing once) if there are no planets for that star. Good so far!

Now we need to add the planet temperature. Should be simple:

SELECT Stars.name as starName, (Stars.class + 7) * Stars.intensity * 1000000 AS starTemp,
       Planets.name as planetName, starTemp - (50 * Planets.orbitDistance) as planetTemp
FROM Stars
LEFT JOIN Planets
       ON Planets.starId = Stars.starId
WHERE Stars.starId < 100

...except that on most RDBMSs, you'll get a syntax error stating the system can't find starTemp. What's going on? The problem is that the new column alias (name) isn't (usually) available until after the SELECT part of the statement runs. Which means we need to put in the calculation again:

SELECT Stars.name as starName, (Stars.class + 7) * Stars.intensity * 1000000 AS starTemp,
       Planets.name as planetName, 
       ((Stars.class + 7) * Stars.intensity * 1000000) - (50 * Planets.orbitDistance) as planetTemp
FROM Stars
LEFT JOIN Planets
       ON Planets.starId = Stars.starId
WHERE Stars.starId < 100

(note that the db may actually be smart enough to perform the starTemp calculation only once per-line, but when writing you have to mention it twice in this context).
Well, that's slightly messy, but it works. Hopefully, you'll remember to change both references if that's necessary...

Thankfully, we can move the Stars portion of this into a subquery. We'll only have to list the calculation for starTemp once!

SELECT Stars.starName, Stars.starTemp,
       Planets.name as planetName, 
       Stars.starTemp - (50 * Planets.orbitDistance) as planetTemp
FROM (SELECT starId, name AS starName, (class + 7) * intensity * 1000000 AS starTemp 
      FROM Stars
      WHERE starId < 100) Stars
LEFT JOIN Planets
       ON Planets.starId = Stars.starId

Yeah, that looks like how I'd write it. Should work on essentially any RDBMS.

Note that the parenthesis in Stars.starTemp - (50 * Planets.orbitDistance) is only there for clarity for the reader, the meaning of the math would remain unchanged if they were removed. Regardless of how well you know operator-precedence rules, always put in parenthesis when mixing operations. This becomes especially beneficial when dealing with ORs and ANDs in JOIN and WHERE conditions - many people lose track of what's going to be effected.
Also note that the implicit-join syntax (the comma-separated FROM clause) is considered bad practice in general, or outright deprecated on some platforms (queries will still run, but the db may scold you). It also makes certain things - like LEFT JOINs - difficult to do, and increases the possibility of accidently sabotaging yourself. So please, avoid it.

Clockwork-Muse
  • 12,806
  • 6
  • 31
  • 45
  • So when we are running subquery and assigning it to alias we still have to refer to it as table.alias in SELECT? – verkter May 25 '14 at 06:57
  • Oh, no, you could name it `someLongNameToSummonElderGodsFromTheDeep` (subject to db limits) if you wanted to. If I had a join in the subquery, how would the system know which one I "had" to pick? The important thing is that I have to give _a_ name, and it doesn't matter much what. I simply reused `Stars` here because that was essentially the data you were still dealing with; for more complicated cases, just pick a name that best explains what the data represents. – Clockwork-Muse May 25 '14 at 07:18
  • One more thing. Why is there no comma after Stars in "WHERE starId < 100) Stars LEFT JOIN Planets"? Is it OK to omit commas? Thanks again. – verkter May 25 '14 at 23:27
  • Because I put `(LEFT) JOIN` there. It's not that the comma is optional, you _can't_ use it in this case. This is what I was referring to with the implicit-join syntax - just don't use it, in favor of listing out joins. (Note that essentially the subquery counts as a 'table' here, so you could do as you suggest, but frankly anbody who recommends that syntax should be scolded. It definitely shouldn't be taught as _how_ to do it.) – Clockwork-Muse May 25 '14 at 23:41
  • Sorry, I meant why there is no comma between ")" and Stars. – verkter May 26 '14 at 00:54
  • Ah. Because I'm not also referring to that table again, I'm providing an alias for the subquery. I'm telling the db to refer to the results of the subquery by the name `stars`. Note that if I did something like `JOIN Stars` it'd actually refer to the table on disk (and would need some new alias to be able to safely use the `ON` clause). It's completely possible to do something like `FROM Stars Planets JOIN Planets Stars` to flip the in-query names of tables, although there's usually little point for the confusion that would cause. – Clockwork-Muse May 26 '14 at 01:47
  • Great answer. But let's say I need the stars information and the name of his largest orbitDistance planet (not all his planets). We can still have stars without planets. How can I add such condition to a left join? – xaume Jan 12 '16 at 19:30
  • 1
    @xaume - separate question, but it's actually a [tag:greatest-n-per-group] (sub) problem. – Clockwork-Muse Jan 12 '16 at 22:35
9
SELECT * FROM (SELECT [...]) as Alias1
LEFT OUTER JOIN 
    (SELECT [...]) as Alias2
ON Alias1.id = Alias2.id
Alexandru Marculescu
  • 5,569
  • 6
  • 34
  • 50
fieven
  • 91
  • 1
  • 1
2
WITH( 
 SELECT 
   stars.name AS starname, ((star.class+7)*star.intensity)*1000000) AS startemp,
   stars.starid
 FROM
   stars
) AS star_temps
SELECT 
   planets.name AS planetname, (startemp-50*planets.orbitdistance) AS planettemp
   star_temps.starname, star_temps.startemp
FROM 
   star_temps LEFT OUTER JOIN planets USING (star_id)
WHERE
   star_temps.starid < 100;

Alternatively, one can construct a sub-query (I've used a common table expression) to accomplish the same task as demonstrated below:

SELECT 
   planets.name AS planetname, (startemp-50*planets.orbitdistance) AS planettemp
   star_temps.starname, star_temps.startemp
FROM 
   (SELECT 
      stars.name AS starname, ((star.class+7)*star.intensity)*1000000) AS startemp,
      stars.starid
    FROM
      stars
 ) AS star_temps
LEFT OUTER JOIN planets USING (star_id)
WHERE
   star_temps.starid < 100;
SystemFun
  • 1,062
  • 4
  • 11
  • 21
  • Thank you! Can you please complete this using LEFT OUTER JOIN? – verkter May 25 '14 at 05:49
  • Actually, because the query isn't using `LEFT JOIN`, it's only going to print rows that _have_ planets, instead of leaving out that information if not present. – Clockwork-Muse May 25 '14 at 05:56
  • @Clockwork-Muse You are correct, read question to quickly, updated answer to include all stars. – SystemFun May 25 '14 at 06:26
  • @verkter hope that helps! – SystemFun May 25 '14 at 06:26
  • As systems that _do_ support CTEs (it's not universal) are allowed to build temp-tables for them, it's best to put as many restricting conditions in them as possible (ie, `starid < 100`). `USING()` is also not universal, and is actually potentially dangerous: What happens when you're trying to find which actresses have been to which planets, and their id is also `starId`? To say nothing of the times when you need to join multiple `Stars`, but on some other column? Please, always explicitly list the join conditions. Still, this _does_ answer the question, so +1. – Clockwork-Muse May 25 '14 at 06:35
  • I'm not sure I follow the USING() example being dangerous? Care to elaborate a bit? – SystemFun May 25 '14 at 06:37
  • Let's say we want to compare the intensities of all stars in the same class, normally you'd want to use a self-join like `star1.class = star2.class AND star1.starId <> star2.starId` (the last bit so we don't compare a star to itself). Now, we also want to list the respective stars planets... what happens with `USING()`? How does it know which reference to link to? It can't use both - the `starId`s won't match (so you'd get no planets). And removing the earlier `<>` so it maybe could would end worse - you'd only get planets when the star matched itself (or even worse, only matched stars)! – Clockwork-Muse May 25 '14 at 07:26
  • One more thing. Why is there no comma after Stars in "WHERE starId < 100) Stars LEFT JOIN Planets"? Is it OK to omit commas? Thanks again. – verkter May 25 '14 at 19:34