2
CREATE TABLE most_prof
(
    pub_id CHAR(4) NOT NULL,
    top_profit VARCHAR(80) NOT NULL,
    date_time DATETIME,
    PRIMARY KEY (top_profit)
)

INSERT INTO most_prof (pub_id, top_profit, date_time)
    SELECT t.pub_id, t.title, t.pubdate
    FROM titles AS t 

This is homework so I am by no means asking for an answer, just some guidance! In the select statement, where I have t.title, I need to return the title that is associated with a calculated top profit. So I need to take a few columns, calculate the top profit, and return that title associated with it. Each pub_id has multiple titles associated with it but I just need the one with the top profit and the pubdate associated with it.

I've tried a few things but I keep getting this error

Column 'titles.pub_id' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.

Since this is homework, please no answers, just looking for guidance as best you can.

EDIT:

In the SELECT statement t.title returns the title of a book, however this has to be a specific title. To find that, I need to use other columns in the table to calculate each publishers titles profit - something like (((price - (royalty * 1.0 / 100)) * ytd_sales) - advance). Then I need to use that to return 1 title that has the highest profit from each publisher.

2ND EDIT: This is what gives me the error as stated above

INSERT INTO most_prof(pub_id, top_profit, date_time)
    SELECT t.pub_id, (((price - (royalty * 1.0 / 100)) * ytd_sales) - 
advance), t.pubdate
    FROM titles AS t 
    GROUP BY t.title;

3rd EDIT:

INSERT INTO most_prof (pub_id, top_profit, date_time)
    SELECT DISTINCT
        pub_id, MAX (((price - (royalty * 1.0 / 100)) * ytd_sales) - advance), 
        pubdate
    FROM titles
    GROUP BY pub_id

Now this would return the pub_id, the top_profit, and the pubdate however:

  1. While it returns the correct profit I need the title associated with that, not the profit itself
  2. selecting pubdate causes that aggregate error again - cant seem to shake that

EDIT 4: Based on comment progress pic

I've added a screen grab of what I got so far - and it's looking good. As you can see i've got multiple titles for each pub_id - how can I return only the most profitable title for each of the 3 pub_id's??

eran0801
  • 39
  • 4

1 Answers1

2

First off, I would like to thank you for this :

This is homework so I am by no means asking for an answer, just some guidance!

and this :

Since this is homework, please no answers, just looking for guidance as best you can.

What I'll try to do, is giving you a some hints that would lead you to your answer.

(From the given information) I assume this homework, will be using aggregation functions (e.g. SUM(), AVG(), MIN(),MAX() ..etc), GROUP BY, (maybe HAVING), and ORDER BY.

Remember, whenever you use aggregation functions, you'll need to specify the columns that are under SELECT, and they're not contained with an aggregation function in the GROUP BY clause.

So, if you did this :

SELECT  t.pub_id,  SUM(((price - (royalty * 1.0 / 100)) * ytd_sales) - advance), t.pubdate
FROM titles AS t 
GROUP BY t.title;

it'll give you an error, because there is no t.title under the SELECT clause. So, the correct one would be something like this : GROUP BY t.pub_id, t.pubdate

If you want it to be grouped by t.title, you'll need to do this :

SELECT t.title,  SUM(((price - (royalty * 1.0 / 100)) * ytd_sales) - advance)
    FROM titles AS t 
    GROUP BY t.title;

if there is a required period of time (say you want to get the profits of the last 3 months for each title), then you could use t.pubdate but this will be under the WHERE (or HAVING) clause, and the t.title will be under the SELECT clause.

Now, after you've done the calculations, you'll need to reorder the results by the highest profit, and then you just select the top profit. Some people likes it easier to contain two aggregation functions (say MIN(SUM(...)) ) to get the highest or lowest, which will avoid them from using ORDER BY.

UPDATE (Based on comments)

Since you've covered sub-queries, you're very close to the answer.

After running the second query for titles, I'm able to see all the titles and their profits, I can't seem to figure out how to correlate those titles with their corresponding pub_id. Lets say I have 3 pub_id's - and each pub_id has multiple titles. I can't figure out how to correlate the top_profit title for a particular pub_id, if that makes sense.

The correct way to get the profits is to find the unique id that you can use to sum up the profits for. for your case you said pub_id has multiple title then, I assume that pub_id is the primary key and the titles are assigned to each primary key. (for instance, a publisher can published multiple books) so you need to get the publisher profits from all the published books.

With this, you know that you'll need to get the publisher and not the books. So, in your query, you'll need to replace titles with pub_id and keep the rest as is. this would sum-up all profit based on pub_id, which is the main requirement that you need.

If you need to get the profit for each publisher's book, you can use

SELECT t.pub_id, t.title SUM(....) as Profit
FROM  titles AS t 
GROUP BY t.pub_id, t.title; 

This is like telling SQL : Hey, get me the profit of each book for publisher. which will give you the profit for each book.

But if you do this :

SELECT t.pub_id, SUM(....) as Profit
FROM  titles AS t 
GROUP BY t.pub_id; 

It'll get you the profit for each publisher (meaning if a publisher has 5 books, it'll be summed up).

So, the more columns you include in the GROUP BY the SUM() will be detailed further.

If you need to enclose it with a sub-query, there are different ways to do this but I'll give you two ways.

Method one :

SELECT * 
FROM (
    SELECT t.pub_id, SUM(....) as Profit
    FROM  titles AS t 
    GROUP BY t.pub_id; 
) D -- alias is mandatory

Method two :

SELECT *
FROM titles t1 
JOIN (
    SELECT t.pub_id, SUM(....) as Profit
    FROM  titles AS t 
    GROUP BY t.pub_id; 
) t2 ON t1.pub_id = t2.pub_id

So, you can use method one or method two, whatever you like.

UPDATE (Based on comments) #2

I've added a screen grab of what I got so far - and it's looking good. As you can see I've got multiple titles for each pub_id - how can I return only the most profitable title for each of the 3 pub_id's??

Great, you're almost done. You'll need to use a function named ROW_NUMBER() to number the rows based on our custom condition. So we will add this : ROW_NUMBER() OVER(PARTITION BY t1.pub_id ORDER BY Profit DESC) AS ProfitOrder The new order will be partitioned by t1.pub_id and order each partition them by the highest to the lowest Profit (for each id group).

Our query should something like this :

SELECT 
        t1.title
    ,   t1.pub_id
    ,   t1.pubdate
    ,   Profit
    ,   ROW_NUMBER() OVER(PARTITION BY t1.pub_id ORDER BY Profit DESC) AS ProfitOrder
    FROM titles t1 
    JOIN (
        SELECT t.pub_id, t.title, SUM(price * ytd_sales) as Profit
        FROM  titles AS t 
        GROUP BY t.pub_id 
    ) t2 ON t1.pub_id = t2.pub_id

If you run the above query, the ProfitOrder number 1 for each pub_id will have the highest Profit, this verifies that our records is sorted out correctly, and we just need to get the first 3 rows from each group using the ProfitOrder. and if we need to do this :

SELECT 
    t1.title
,   t1.pub_id
,   t1.pubdate
,   Profit
,   ROW_NUMBER() OVER(PARTITION BY t1.pub_id ORDER BY Profit DESC) AS ProfitOrder
FROM titles t1 
JOIN (
    SELECT t.pub_id, t.title, SUM(price * ytd_sales) as Profit
    FROM  titles AS t 
    GROUP BY t.pub_id 
) t2 ON t1.pub_id = t2.pub_id
WHERE 
    ProfitOrder <= 3

it will yield an error, because we can't use column alias under WHERE, unless if we convert the query into a sub-query. So, we need to contain this query (again) into a sub-query. Like this one :

SELECT *
FROM (
    SELECT 
        t1.title
    ,   t1.pub_id
    ,   t1.pubdate
    ,   Profit
    ,   ROW_NUMBER() OVER(PARTITION BY t1.pub_id ORDER BY Profit DESC) AS ProfitOrder
    FROM titles t1 
    JOIN (
        SELECT t.pub_id, t.title, SUM(price * ytd_sales) as Profit
        FROM  titles AS t 
        GROUP BY t.pub_id 
    ) t2 ON t1.pub_id = t2.pub_id
) D 
WHERE 
    ProfitOrder <= 3

Now, verify the records again, if the output is what you expected. Then, you will only need to use the top SELECT * and select the columns that you need to be showed only. LIKE SELECT pub_id, Profit, pubdate, and include it under your INSERT but make sure that you matched the columns between the INSERT and your SELECT.

iSR5
  • 3,274
  • 2
  • 14
  • 13
  • This does help, thank you. I am just really at a loss on how to return the title of the book that corresponds to our calculated top_profit, along with its pub_id and pub_date. I keep getting that aggregate error. – eran0801 Nov 15 '18 at 03:55
  • @eran0801 think of your output first, how the output will be, in your case, you want to show (title) then, the column title would be under the `SELECT` without any aggregation , the rest will be just used as filters under `WHERE`. So, for start, you can start using the second query on my answer, and start building the rest of the requirements on top of it. let me know if you have any difficulties I'll try my best. – iSR5 Nov 15 '18 at 05:13
  • @eran0801 just I need to know, have you covered correlated queries ( aka sub-queries) in your course ?. – iSR5 Nov 15 '18 at 05:15
  • when I only have SELECT t.title I get the an error because I need to insert something into pub_id, and date_time as well but having only title in the select statement leaves me short a couple items – eran0801 Nov 15 '18 at 15:02
  • After running the second query for titles, I'm able to see all the titles and their profits, I can't seem to figure out how to correlate those titles with their corresponding pub_id. Lets say I have 3 pub_id's - and each pub_id has multiple titles. I can't figure out how to correlate the top_profit title for a particular pub_id, if that makes sense. – eran0801 Nov 15 '18 at 15:24
  • @eran0801 great, I've updated my answer, see updates. I hope it'll make things easier to understand. – iSR5 Nov 15 '18 at 16:11
  • @eran0801 just let me ask you a question, do you want to sum the titles for each publisher and get the top 3 (highest) profits. So, each publisher will have 3 titles along with the profit and the date of each title. is that what you want ? – iSR5 Nov 15 '18 at 17:25
  • @eran0801 I asked you that because your insert is only including the `pub_id,pub_id, top_profit, date_time)` while you're adding more columns than it should. – iSR5 Nov 15 '18 at 17:31
  • @iRS5 There are 3 different pub_id's, I need to populate the table with each pub_id (once each) followed by the highest profit title that the publisher (pub_id) published and the pub_date for that title. So should be 3 rows total – eran0801 Nov 15 '18 at 20:47
  • @eran0801 check the update, and let me know if you need further help . – iSR5 Nov 15 '18 at 23:13