2

I have a problem. I have 4 tables:

Invoice_Payment, Invoice, Client, and Calendar

Basically, I have the below query, which works well, except, months with no date_due don't return. I.E only months with a date_due will be returned.

Note: the calendar table simply lists every day of the year. It contains a single col call date_field

Database: http://oberto.co.nz/jInvoice.txt

Expected output: my current query, below, will return something like:

month       total
August      5
September   0
October  196
November  205
December  214
January  229

Notice how September isn't returned? It's because table Invoice_Payment has no date_due record

I think I have to use a left join and join the calendar table with something like: LEFT JOIN Calendar ON Invoice_Payments.date_paid = Calendar.date_field. But I'm having no luck

SELECT MONTHNAME(Invoice_Payments.date_paid) as month, SUM(Invoice_Payments.paid_amount) AS total
FROM Invoice, Client, Invoice_Payments
WHERE Client.registered_id = 1
AND Client.id = Invoice.client_id
And Invoice.id = Invoice_Payments.invoice_id
AND date_paid IS NOT NULL
GROUP BY YEAR(Invoice_Payments.date_paid), MONTH(Invoice_Payments.date_paid)

Any help appreciated.

Jarrod
  • 9,349
  • 5
  • 58
  • 73

3 Answers3

5

It sounds like you are trying to find values for all dates within a range regardless of whether there is a value or not. Supposing we have a Calendar table structured like so:

Create Table Calendar
(
    [Date] not null Primary Key
)

Your query might look like so (where X and Y represent the start and end date of the range in which you are investigating):

Select Year(C.Date), MonthName(C.Date) As Month
    , Coalesce(Sum(IP.paid_amount),0) As Total
From Calendar As C
    Left Join (Invoice As I
        Join Client As C1
            On C1.id = I.client_id
                And C.registered_id = 1
        Join Invoice_Payments As IP
            On IP.Invoice_id = I.Id)
        On IP.date_paid = C.date
Where C.Date Between X and Y
Group By Year(C.Date), MonthName(C.Date)

Technically, the above query should do the trick. However, another alternative is to use a derived table about which you inquired in the comments:

Select Year(C.Date), MonthName(C.Date) As Month
    , Coalesce(Sum(Z.paid_amount),0) As Total
From Calendar As C
    Left Join   (
                Select IP.date_paid, IP.paid_amount
                From Invoice As I
                        Join Client As C1
                            On C1.id = I.client_id
                                And C.registered_id = 1
                        Join Invoice_Payments As IP
                            On IP.Invoice_id = I.Id
                ) As Z
        On Z.date_paid = C.date
Where C.Date Between X and Y
Group By Year(C.Date), MonthName(C.Date)
Thomas
  • 63,911
  • 12
  • 95
  • 141
  • Hi Thomas. Thanks for replying. I get "#1066 - Not unique table/alias: 'C'" – Jarrod Jan 16 '11 at 07:16
  • @Jarrod - Fixed. I used the alias "C" for both the Calendar table and the Client table. Have simply changed the alias on the Client table. – Thomas Jan 16 '11 at 18:09
  • Hi Thomas, thanks for editing. However, I still get some errors that I cannot resolve: "Unknown column 'I.date_paid' in 'on clause'". I've added the following diagram to help understand my issue: http://oberto.co.nz/db-sql.png – Jarrod Jan 16 '11 at 18:44
  • @Jarrod - That's a column from your original query. Since you did not use a prefix, I have no way of knowing in which table it is contained. I guessed that it was Invoice table. However, if it is the Invoice_Payment table just change I.date_paid to IP.date_paid. – Thomas Jan 16 '11 at 21:58
  • Thanks again. Still not the result I'm after as Only January's data is included. See http://oberto.co.nz/Selection_204.png. I think it's on the right track but it's also missing the condition Client.registered_id = 1. This is driving me mental – Jarrod Jan 16 '11 at 22:43
  • @Jarrod - Given that you are left joining to the Calendar table, there are only so many reasons that you only get January data. 1. Your X and Y range only included Jan. 2. You put the `registered_id=1` criteria in the Where clause and only Jan data has that value. 3. Your calendar table only includes Jan and not all dates across the range you want. – Thomas Jan 16 '11 at 23:11
  • @Jarrod - Btw, I added the registered_id criteria to my solution. – Thomas Jan 16 '11 at 23:11
  • THanks again. We're getting closer I can smell it! I changed the one line by adding MONTHNAME so it now: MONTHNAME(IP.date_paid) = MONTHNAME(C.date_field). THis helped. I got a correct result for every month. The problem now is that the results are too high. Any idea? I REALLY appreciate your time! – Jarrod Jan 16 '11 at 23:29
  • @Jarrod - First, I did not notice that the columns you used in the Group By were different than in the Select. I've corrected that. Joining on the month of the Calendar is likely not correct. You should be able to join directly on the Calendar.Date as I have assuming that the Calendar table indeed contains all dates across the range you want. – Thomas Jan 16 '11 at 23:33
  • @Jarrod - Btw, it might help if you amended your original post to include some same input data that expresses the edge cases and some expected output. – Thomas Jan 16 '11 at 23:35
  • I've added the database and expected results. – Jarrod Jan 16 '11 at 23:48
  • @Jarrod - A derived table should will also work. I've revised my post to show an example. – Thomas Jan 17 '11 at 04:14
  • Thanks Thomas. The last one worked a treat. Thanks for your time! – Jarrod Jan 17 '11 at 07:51
  • @Jarrod - Logically, they are identical. If the first one did not work, that means that MySQL also does not honor the parenthetical joins. – Thomas Jan 17 '11 at 18:32
1

I've run into a similar issue and solved it by moving the non calendar table conditions out of the WHERE clause and into the join. See my full explanation here: https://gist.github.com/1137089

In your case you'll need to factor the from table1, table2, table3 into left join table1 on x=y etc...

todd
  • 2,381
  • 1
  • 20
  • 11
0

I would explicitly define all your joins for readability.

A simple example of an outer join preserving all rows in the first table, filling any missing columns in the second table with nulls:

select column1,
from table1
left outer join table2
  on table1.key = table2.key

Assuming the Calendar table is the one causing you to lose data, here's a shot at your query:

SELECT MONTHNAME(Invoice_Payments.date_paid) as month,
       SUM(Invoice_Payments.paid_amount)
FROM Invoice
JOIN Client
  ON Client.id = Invoice.client_id
 AND Client.registered_id = 1
JOIN Invoice_Payments
  ON Invoice_Payments.invoice_id = Invoice.id
LEFT OUTER JOIN Calendar
  ON Calendar.key = #yourothertable#.key    
WHERE date_paid IS NOT NULL
GROUP BY YEAR(Invoice_Payments.date_paid), MONTH(Invoice_Payments.date_paid)

Good luck!

cag
  • 492
  • 3
  • 14
  • Hi Cag. Thanks for replying. I think this is on the right track. But it's the Invoice_Payment table causing me to loose data. I've updated my question above to show this. If there is not record in the Invoice_Payment table for a particular month, then that month won't show. And that's where I'm stuck – Jarrod Jan 16 '11 at 07:23
  • Hey Jarrod, you might have figured it out by now, but if you are losing data because of the Invoice_Payment table then you could change that particular join in the statement I wrote to: 'LEFT OUTER JOIN Invoice_Payments' and remove the two lines about the calendar – cag Jan 21 '11 at 05:19