1

I have a query with joins and subquery (derived table). If I run it without LIMIT 1 the result will contain the vat and the id field with the proper values.

The first query:

SELECT i.id, i.vat, pl.invoice_id as inv_id, pl.product_id as pl_id, pl.quantity as qty, pl.price, pl.currency, p.name, p.manufacturer, p.list_price, p.cost_price, p.wholesale_price, p.cikkszam, p.unit, p.group_name 
FROM soulnsoda_products_log pl 
LEFT JOIN soulnsoda_products p ON pl.product_id=p.id 
LEFT JOIN (select id, vat, parent_id, beneficiary_account from soulnsoda_invoices) as i ON i.parent_id>0 AND pl.invoice_id=i.parent_id AND pl.product_id=i.beneficiary_account 
WHERE pl.action=6 AND p.cikkszam = 'S6511415-BLK' AND (pl.stamp BETWEEN '2015-08-15 00:00:00' AND '2015-08-15 23:59:59') AND pl.warehouse_name='Garage - Árkád'
ORDER BY p.cikkszam

The result will contain the i.id and i.vat fields with values, but duplicating rows:

id      vat     inv_id  pl_id   qty     price       name                
93119   27.00   93117   21961   -1.00   1096.85 HUF SUPRA ICON SX BLACK DB
93120   27.00   93117   21961   -1.00   1096.85 HUF SUPRA ICON SX BLACK DB
93119   27.00   93117   21961   -1.00   1096.85 HUF SUPRA ICON SX BLACK DB
93120   27.00   93117   21961   -1.00   1096.85 HUF SUPRA ICON SX BLACK DB  

I have to filter out rows with duplicated ids. When I'm using LIMIT 1 the duplication will be gone but the id and the vat field will be NULL. And I don't know why...

The second query with LIMIT:

SELECT i.id, i.vat, pl.invoice_id as inv_id, pl.product_id as pl_id, pl.quantity as qty, pl.price, pl.currency, p.name, p.manufacturer, p.list_price, p.cost_price, p.wholesale_price, p.cikkszam, p.unit, p.group_name 
FROM soulnsoda_products_log pl 
LEFT JOIN soulnsoda_products p ON pl.product_id=p.id 
LEFT JOIN (select id, vat, parent_id, beneficiary_account from soulnsoda_invoices LIMIT 1) as i ON i.parent_id>0 AND pl.invoice_id=i.parent_id AND pl.product_id=i.beneficiary_account 
WHERE pl.action=6 AND p.cikkszam = 'S6511415-BLK' AND (pl.stamp BETWEEN '2015-08-15 00:00:00' AND '2015-08-15 23:59:59') AND pl.warehouse_name='Garage - Árkád'
ORDER BY p.cikkszam

The result will be two rows which is OK, but there are no id and vat:

id      vat     inv_id  pl_id   qty     price       name                
NULL    NULL    93117   21961   -1.00   1096.85 HUF SUPRA ICON SX BLACK DB
NULL    NULL    93117   21961   -1.00   1096.85 HUF SUPRA ICON SX BLACK DB

I tried LIMIT 1, GROUP BY, MIN, MAX, etc.

I know that DISTINCT is working with my sample, but it is a simplified query to show you the problem itself.

How can I achieve the result with only two rows, but with a value in the id and vat column, using some technique in (or around) the LEFT JOIN subquery?

Tamas
  • 33
  • 5

2 Answers2

0

If your problem is duplication of rows, then use select distinct. Actually, it would be better to find the cause of the duplication, but this might be what you want:

SELECT distinct i.id, i.vat, pl.invoice_id as inv_id, pl.product_id as pl_id,
       pl.quantity as qty, pl.price, pl.currency, p.name, p.manufacturer, 
       p.list_price, p.cost_price, p.wholesale_price, p.cikkszam, p.unit, p.group_name 
FROM soulnsoda_products_log pl LEFT JOIN
     soulnsoda_products p
     ON pl.product_id = p.id LEFT JOIN
     soulnsoda_invoices i
     ON i.parent_id > 0 AND pl.invoice_id = i.parent_id AND
        pl.product_id = i.beneficiary_account 
WHERE pl.action = 6 AND
      p.cikkszam = 'S6511415-BLK' AND
      pl.stamp >= '2015-08-15' AND pl.stamp < '2015-08-16' AND    
      pl.warehouse_name = 'Garage - Árkád'
ORDER BY p.cikkszam

Some notes:

  • The left join on p is being turned into an inner join by the where clause.
  • The subquery is not necessary and, in fact, just hurts performance.
  • This version simplifies the date comparisons.
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • I tried the distinct in that way, but as I try to explain it I have to deal with the subquery since it is simplified version to understand my problem. In the live query there are some SUM in the parent SELECT. So it's important to deal with only two rows in the inner select. But thank you for your effort! – Tamas Aug 22 '15 at 16:22
0

In your first query, the clause WHERE ... p.cikkszam = 'S6511415-BLK' converts your first LEFT JOIN into an ordinary inner JOIN.

ORDER BY p.cikkszam clause does nothing when combined with WHERE ... p.cikkszam = 'S6511415-BLK': there's only one value in that column of your result set.

Your timestamp end-of-interval matching is a little clumsy and almost correct. Try this, and it will be perfect.

         pl.stamp >= '2015-08-15'
     AND pl.stamp <  '2015-08-15' + INTERVAL 1 DAY

In your second query this subquery clause only allows one, randomly chosen, invoice row to be joined to the rest of your query:

   select id, vat, parent_id, beneficiary_account from soulnsoda_invoices LIMIT 1

This picks the "first" row of the invoices table. The trouble is, without ORDER BY MySQL and other RDMS engines have no solid notion of "first". So your second query isn't right.

Your query seems to ask for one row per invoice for a particular product ('S6511415-BLK') shipped (action=6) from a certain warehouse on a certain day. There are two matching rows in the invoices table in your data. There are also two matching rows in some other table. Or maybe there are four matching rows in your invoices table.

You have not described the contents of your tables to us. So it's hard for us to help you figure out where your duplicate rows are. You can use SELECT DISTINCT to eliminate the dups in your invoices table, like this, if you wish.

Here's the query I suggest. This will give you a two-row result set.

SELECT i.id, i.vat, 
       pl.invoice_id as inv_id, pl.product_id as pl_id, pl.quantity as qty,
       pl.price, pl.currency,
       p.name, p.manufacturer, p.list_price, p.cost_price,
       p.wholesale_price, p.cikkszam, p.unit, p.group_name 
  FROM soulnsoda_products_log pl 
  JOIN soulnsoda_products p ON pl.product_id=p.id 
  LEFT JOIN (SELECT DISTINCT id, vat,
                             parent_id, beneficiary_account
                        FROM soulnsoda_invoices
            ) as i ON  i.parent_id > 0 
                   AND pl.invoice_id = i.parent_id
                   AND pl.product_id = i.beneficiary_account    
 WHERE pl.action=6
   AND p.cikkszam = 'S6511415-BLK' 
   AND pl.stamp >= '2015-08-15'
   AND pl.stamp <  '2015-08-15' + INTERVAL 1 DAY
   AND pl.warehouse_name='Garage - Árkád'
 ORDER BY p.cikkszam, i.id
O. Jones
  • 103,626
  • 17
  • 118
  • 172
  • Thank you Ollie. I know it is hard to figure out the right way since the structure was not explained. If I run your select I got 4 rows. The main problem is that there are two items on the invoice in different rows. So the invoice_id is the same, the product_id is the same, just the invoice item row's id (it is the id itself in the invoces table) is the different. In that case, when I joined the invoices and the products_log table with the invoiced_id and the paroduct_id, I have four rows. Since there are two matching rows with that it will be duplicated... – Tamas Aug 22 '15 at 16:29