First off, I would like to thank you for this :
This is homework so I am by no means asking for an answer, just some
guidance!
and this :
Since this is homework, please no answers, just looking for guidance
as best you can.
What I'll try to do, is giving you a some hints that would lead you to your answer.
(From the given information) I assume this homework, will be using aggregation functions (e.g. SUM(), AVG(), MIN(),MAX() ..etc), GROUP BY, (maybe HAVING), and ORDER BY.
Remember, whenever you use aggregation functions, you'll need to specify the columns that are under SELECT, and they're not contained with an aggregation function in the GROUP BY clause.
So, if you did this :
SELECT t.pub_id, SUM(((price - (royalty * 1.0 / 100)) * ytd_sales) - advance), t.pubdate
FROM titles AS t
GROUP BY t.title;
it'll give you an error, because there is no t.title
under the SELECT clause. So, the correct one would be something like this :
GROUP BY t.pub_id, t.pubdate
If you want it to be grouped by t.title, you'll need to do this :
SELECT t.title, SUM(((price - (royalty * 1.0 / 100)) * ytd_sales) - advance)
FROM titles AS t
GROUP BY t.title;
if there is a required period of time (say you want to get the profits of the last 3 months for each title), then you could use t.pubdate
but this will be under the WHERE (or HAVING) clause, and the t.title
will be under the SELECT clause.
Now, after you've done the calculations, you'll need to reorder the results by the highest profit, and then you just select the top profit. Some people likes it easier to contain two aggregation functions (say MIN(SUM(...))
) to get the highest or lowest, which will avoid them from using ORDER BY.
UPDATE (Based on comments)
Since you've covered sub-queries, you're very close to the answer.
After running the second query for titles, I'm able to see all the
titles and their profits, I can't seem to figure out how to correlate
those titles with their corresponding pub_id. Lets say I have 3
pub_id's - and each pub_id has multiple titles. I can't figure out how
to correlate the top_profit title for a particular pub_id, if that
makes sense.
The correct way to get the profits is to find the unique id that you can use to sum up the profits for. for your case you said pub_id
has multiple title
then, I assume that pub_id
is the primary key and the titles are assigned to each primary key. (for instance, a publisher can published multiple books) so you need to get the publisher profits from all the published books.
With this, you know that you'll need to get the publisher and not the books. So, in your query, you'll need to replace titles
with pub_id
and keep the rest as is. this would sum-up all profit based on pub_id
, which is the main requirement that you need.
If you need to get the profit for each publisher's book, you can use
SELECT t.pub_id, t.title SUM(....) as Profit
FROM titles AS t
GROUP BY t.pub_id, t.title;
This is like telling SQL : Hey, get me the profit of each book for publisher. which will give you the profit for each book.
But if you do this :
SELECT t.pub_id, SUM(....) as Profit
FROM titles AS t
GROUP BY t.pub_id;
It'll get you the profit for each publisher (meaning if a publisher has 5 books, it'll be summed up).
So, the more columns you include in the GROUP BY
the SUM()
will be detailed further.
If you need to enclose it with a sub-query, there are different ways to do this but I'll give you two ways.
Method one :
SELECT *
FROM (
SELECT t.pub_id, SUM(....) as Profit
FROM titles AS t
GROUP BY t.pub_id;
) D -- alias is mandatory
Method two :
SELECT *
FROM titles t1
JOIN (
SELECT t.pub_id, SUM(....) as Profit
FROM titles AS t
GROUP BY t.pub_id;
) t2 ON t1.pub_id = t2.pub_id
So, you can use method one or method two, whatever you like.
UPDATE (Based on comments) #2
I've added a screen grab of what I got so far - and it's looking good.
As you can see I've got multiple titles for each pub_id - how can I
return only the most profitable title for each of the 3 pub_id's??
Great, you're almost done. You'll need to use a function named ROW_NUMBER()
to number the rows based on our custom condition. So we will add this :
ROW_NUMBER() OVER(PARTITION BY t1.pub_id ORDER BY Profit DESC) AS ProfitOrder
The new order will be partitioned by t1.pub_id and order each partition them by the highest to the lowest Profit (for each id group).
Our query should something like this :
SELECT
t1.title
, t1.pub_id
, t1.pubdate
, Profit
, ROW_NUMBER() OVER(PARTITION BY t1.pub_id ORDER BY Profit DESC) AS ProfitOrder
FROM titles t1
JOIN (
SELECT t.pub_id, t.title, SUM(price * ytd_sales) as Profit
FROM titles AS t
GROUP BY t.pub_id
) t2 ON t1.pub_id = t2.pub_id
If you run the above query, the ProfitOrder number 1 for each pub_id will have the highest Profit, this verifies that our records is sorted out correctly, and we just need to get the first 3 rows from each group using the ProfitOrder.
and if we need to do this :
SELECT
t1.title
, t1.pub_id
, t1.pubdate
, Profit
, ROW_NUMBER() OVER(PARTITION BY t1.pub_id ORDER BY Profit DESC) AS ProfitOrder
FROM titles t1
JOIN (
SELECT t.pub_id, t.title, SUM(price * ytd_sales) as Profit
FROM titles AS t
GROUP BY t.pub_id
) t2 ON t1.pub_id = t2.pub_id
WHERE
ProfitOrder <= 3
it will yield an error, because we can't use column alias under WHERE
, unless if we convert the query into a sub-query. So, we need to contain this query (again) into a sub-query. Like this one :
SELECT *
FROM (
SELECT
t1.title
, t1.pub_id
, t1.pubdate
, Profit
, ROW_NUMBER() OVER(PARTITION BY t1.pub_id ORDER BY Profit DESC) AS ProfitOrder
FROM titles t1
JOIN (
SELECT t.pub_id, t.title, SUM(price * ytd_sales) as Profit
FROM titles AS t
GROUP BY t.pub_id
) t2 ON t1.pub_id = t2.pub_id
) D
WHERE
ProfitOrder <= 3
Now, verify the records again, if the output is what you expected. Then, you will only need to use the top SELECT *
and select the columns that you need to be showed only. LIKE SELECT pub_id, Profit, pubdate
, and include it under your INSERT
but make sure that you matched the columns between the INSERT and your SELECT.