1

I have the following code which produces a relation:

SELECT book_id, shipments.customer_id
FROM shipments 
LEFT JOIN editions ON (shipments.isbn = editions.isbn)
LEFT JOIN customers ON (shipments.customer_id = customers.customer_id)

In this relation, there are customer_ids as well as book_ids of books they have bought. My goal is to create a relation with each book in it and then how many unique customers bought it. I assume one way to achieve this is to eliminate all duplicate rows in the relation and then counting the instances of each book_id. So my question is: How can I delete all duplicate rows from this relation?

Thanks!

EDIT: So what I mean is that I want all the rows in the relation to be unique. If there are three identical rows for example, two of them should be removed.

Sahand
  • 7,980
  • 23
  • 69
  • 137
  • Please define *duplicates*: duplicates in terms of ordered books, or in terms of customers who placed the orders? [after rereading: customers ... No: you also want a count. Please rephrase.] – wildplasser Sep 10 '15 at 23:09
  • It sounds like you just want a `GROUP BY`... – Craig Ringer Sep 10 '15 at 23:18
  • Craig, I have experimented with the GROUP BY but with no good results, any suggestions to how I could use it? – Sahand Sep 10 '15 at 23:24

2 Answers2

0

This will give you all the {customer,edition} pairs for which an order exists:

SELECT *
FROM customers c
JOIN editions e ON (
    SELECT * FROM shipments s
    WHERE s.isbn = e.isbn
      AND s.customer_id = c.customer_id
    );
wildplasser
  • 43,142
  • 8
  • 66
  • 109
0

The duplicates are in table shipments. You can remove these with a DISTINCT clause and then count them in an outer query GROUP BY isbn:

SELECT isbn, count(customer_id) AS unique_buyers
FROM (
  SELECT DISTINCT isbn, customer_id FROM shipments) book_buyer
GROUP BY isbn;

If you want a list of all books, even where no purchases were made, you should LEFT JOIN the above to the list of all books:

SELECT isbn, coalesce(unique_buyers, 0) AS books_sold_to_unique_buyers
FROM editions
LEFT JOIN (
  SELECT isbn, count(customer_id) AS unique_buyers
  FROM (
    SELECT DISTINCT isbn, customer_id FROM shipments) book_buyer
  GROUP BY isbn) books_bought USING (isbn)
ORDER BY isbn;

You can write this more succinctly by joining before counting:

SELECT isbn, count(customer_id) AS books_sold_to_unique_buyers
FROM editions
LEFT JOIN (
  SELECT DISTINCT isbn, customer_id FROM shipments) book_buyer USING (isbn)
GROUP BY isbn
ORDER BY isbn;
Patrick
  • 29,357
  • 6
  • 62
  • 90