0

Please note that I'm an absolute n00b in MySQL but somehow I managed to build some (for me) complex queries that work as they should. My main problem now is that for a many of the queries we're working on:

  • The querie is becoming too big and very hard to see through.
  • The same subqueries get repeated many times and that is adding to the complexity (and probably to the time needed to process the query).

We want to further expand this query but we are reaching a point where we can no longer oversee what we are doing. I've added one of these subqueries at the end of this post, just as an example.

!! You can fast foward to the Problem section if you want to skip the details below. I think the question can be answered also without the additional info.

What we want to do

Create a MySQL query that calculates purchase orders and forecasts for a given supplier based on:

  • Sales history in a given period (past [x] months = interval)
  • Current stock
  • Items already in backorder (from supplier)
  • Reserved items (for customers)
  • Supplier ID

I've added an example of a subquery at the bottom of this message. We're showing just this part to keep things simple for now. The output of the subquery is:

  • Part number
  • Units sold
  • Units sold (outliers removed)
  • Units sold per month (outliers removed)
  • Number of invoices with the part number in the period (interval)

It works quite OK for us, although I'm sure it can be optimised. It removes outliers from the sales history (e.g. one customer that orders 50 pcs of one product in one order). Unfortunately it can only remove outliers with substantial data, so if the first order happens to be 50 pcs then it is not considered an outlier. For that reason we take the amount of invoices into account in the main query. The amount of invoices has to exceed a certain number otherwise the system wil revert to a fixed value of "maximum stock" for that product.

As mentioned this is only a small part of the complete query and we want to expand it even further (so that it takes into account the "sales history" of parts that where used in assembled products).

For example if we were to build and sell cars, and we want to place an order with our tyre supplier, the query calculates the amount of tyres we need to order based on the sales history of the various car models (while also taking into account the stock of the cars, reserved cars and stock of the tyres).

Problem

The query is becomming massive and incomprehensible. We are repeating the same subqueries many times which to us seems highly inefficient and it is the main cause why the query is becomming so bulky.

What we have tried

(Please note that we are on MySQL 5.5.33. We will update our server soon but for now we are limited to this version.)

  1. Create a VIEW from the subqueries. The main issue here is that we can't execute the view with parameters like supplier_id and interval period. Our subquery calculates the sum of the sold items for a given supplier within the given period. So even if we would build the VIEW so that it calculates this for ALL products from ALL suppliers we would still have the issue that we can't define the interval period after the VIEW has been executed.

  2. A stored procedure. Correct me if I'm wrong but as far as I know, MySQL only allows us to perform a Call on a stored procedure so we still can't run it against the parameters (period, supplier id...)

Even this workaround won't help us because we still can't run the SP against the parameters.

  1. Using WITH at the beginning of the query

A common table expression in MySQL is a temporary result whose scope is confined to a single statement. You can refer this expression multiple times with in the statement. The WITH clause in MySQL is used to specify a Common Table Expression, a with clause can have one or more comms-separated subclauses.

Not sure if this would be the solution because we can't test it. WITH is not supported untill MySQL version 8.0.

What now?

My last resort would be to put the mentioned subqueries in a temp table before starting the main query. This might not completely eliminate our problems but at least the main query will be more comprehensible and with less repetition of fetching the same data. Would this be our best option or have I overlooked a more efficient way to tackle this?

Many thanks for your kind replies.

  SELECT 
    GREATEST((verkocht_sd/6*((100 + 0)/100)),0) as 'units sold p/month ',     
    GREATEST(ROUND((((verkocht_sd/6)*3)-voorraad+reserved-backorder),0),0) as 'Order based on units sold',
    SUM(b.aantal) as 'Units sold in period', 
    t4.verkocht_sd as 'Units sold in period, outliers removed',
    COUNT(*) as 'Number of invoices in period',
    b.art_code as 'Part number'
   FROM bongegs b -- Table that has all the sales records for all products
   RIGHT JOIN totvrd ON (totvrd.art_code = b.art_code) -- Right Join stock data to also include items that are not in table bongegs (no sales history). 
   LEFT JOIN artcred ON (artcred.art_code = b.art_code) -- add supplier ID to the part numbers. 
   LEFT JOIN
    (
     SELECT 
      SUM(b.aantal) as verkocht_sd, 
      b.art_code
     FROM bongegs b
     RIGHT JOIN totvrd ON (totvrd.art_code = b.art_code)
     LEFT JOIN artcred ON (artcred.art_code = b.art_code)
     WHERE 
      b.bon_datum > DATE_SUB(CURDATE(), INTERVAL 6 MONTH)
      and b.bon_soort = "f" -- Selects only invoices
      and artcred.vln = 1  -- 1 = Prefered supplier
      and artcred.cred_nr = 9117 -- Supplier ID
      and b.aantal < (select * from (SELECT AVG(b.aantal)+3*STDDEV(aantal) 
           FROM bongegs b
           WHERE 
           b.bon_soort = 'f' and 
           b.bon_datum > DATE_SUB(CURDATE(), INTERVAL 6 MONTH)) x)
     GROUP BY b.art_code
    ) AS t4
   ON (b.art_code = t4.art_code) 

   WHERE 
    b.bon_datum > DATE_SUB(CURDATE(), INTERVAL 6 MONTH)
    and b.bon_soort = "f" 
    and artcred.vln = 1  
    and artcred.cred_nr = 9117
   GROUP BY b.art_code

Bongegs | all rows from sales forms (invoices F, offers O, delivery notes V)

| art_code | bon_datum  | bon_soort | aantal |
|:---------|:---------: |:---------:|:------:|
|  item_1  | 2021-08-21 |    f      |    6   |
|  item_2  | 2021-08-29 |    v      |    3   |
|  item_6  | 2021-09-03 |    o      |    2   |
|  item_4  | 2021-10-21 |    f      |    6   |
|  item_1  | 2021-11-21 |    o      |    6   |
|  item_3  | 2022-01-17 |    v      |    6   |
|  item_1  | 2022-01-21 |    o      |    6   |
|  item_4  | 2022-01-26 |    f      |    6   |

Artcred | supplier ID's

| art_code | vln  | cred_nr |
|:---------|:----:|:-------:|
|  item_1  |   1  |  1001   | 
|  item_2  |   1  |  1002   | 
|  item_3  |   1  |  1001   |   
|  item_4  |   1  |  1007   |   
|  item_5  |   1  |  1004   |   
|  item_5  |   2  |  1008   |
|  item_6  |   1  |  1016   |
|  item_7  |   1  |  1567   |

totvrd | stock

| art_code |  voorraad  | reserved | backorder |
|:---------|:---------: |:--------:|:---------:|
|  item_1  |      1     |    0     |      5    |
|  item_2  |      0     |    0     |      0    |
|  item_3  |     88     |    0     |      0    |
|  item_4  |      9     |    0     |      0    |  
|  item_5  |     67     |    2     |     20    |
|  item_6  |    112     |    9     |      0    |
|  item_7  |     65     |    0     |      0    |
|  item_8  |      7     |    1     |      0    |

Now, on to the query. You have LEFT JOINs to the artcred table, but then include artcred in the WHERE clause making it an INNER JOIN (required both left and right tables) in the result. Was this intended, or are you expecting more records in the bongegs table that do NOT exist in the artcred.

Well to be honest I was not fully aware that this would essentially form an INNER JOIN but in this case it doesn't really matter. A record that exists in bongegs always exists in artcred as well (every sold product must have a supplier). That doesn't work both ways since a product can be in artcred without ever being sold.

You also have RIGHT JOIN on totvrd which implies you want every record in the TotVRD table regardless of a record in the bongegs table. Is this correct?

Yes it is intended. Otherwise only products with actual sales in the period would end up in the result and we also wanted to include products with zero sales.

Evad
  • 1
  • 2
  • Your current query doesn't look too daunting, but I would add some clarity to it. First, I would add single-letter table aliases. For example `FROM bongegs b`; see that `b` there? Then, the next line could end with `= b.art_code)`; that makes the query easier to read. Second: add aliases to **all columns**; I'm sure it's easy for you to tell to which table the column `bon_soort` belongs to, but we can't. – The Impaler Jan 27 '22 at 15:39
  • Thank you for your comment! I've made some quick fixes (added aliases and some comments) for clarification. I get off work now so I will have to do the table aliases later. – Evad Jan 27 '22 at 16:43
  • Added tables and some table aliases. If I'm doing something wrong just let me know and I'll correct it. I rarely work with aliases as to me (personally) it is very confusing. I know most of the tables and columns by heart and replacing them with aliases (for me) seems counterproductive. – Evad Jan 27 '22 at 21:29

2 Answers2

0

By getting what I believe are all the pieces you had, I think this query significantly simplifies the query. Lets first start with the fact that you were trying to eliminate the outliers by selecting the standard deviation stuff as what to be excluded. Then you had the original summation of all sales also from the bongegs table.

To simplify this, I have the sub-query ONCE internal that does the summation, counts, avg, stddev of all orders (f) within the last 6 months. I also computed the divide by 6 for per-month you wanted in the top.

Since the bongegs is now all pre-aggregated ONCE, and grouped per art_code, it does not need to be done one after the other. You can use the totals directly at the top (at least I THINK is similar output without all actual data and understanding of your context).

So the primary table is the product table (Voorraad) and LEFT-JOINED to the pre-query of bongegs. This allows you to get all products regardless of those that have been sold.

Since the one aggregation prequery has the avg and stddev in it, you can simply apply an additional AND clause when joining based on the total sold being less than the avg/stddev context.

The resulting query below.

SELECT 
        -- appears you are looking for the highest percentage?
        -- typically NOT a good idea to name columns starting with numbers, 
        -- but ok.  Typically let interface/output name the columns to end-users
        GREATEST((b.verkocht_sdperMonth * ((100 + 0)/100)),0) as 'units sold p/month',
        -- appears to be the total sold divided by 6 to get monthly average over 6 months query of data
        GREATEST( ROUND(
            ( (b.verkocht_sdperMonth * 3) - v.voorraad + v.reserved - v.backorder), 0), 0) 
            as 'Order based on units sold',
        b.verkocht_sd as 'Units sold in period', 
        b.AvgStdDev as 'AvgStdDeviation',
        b.NumInvoices as 'Number of invoices in period',
        v.art_code as 'Part number'
    FROM
        -- stock, master inventory, regardless of supplier
        -- get all products, even though not all may be sold
        Voorraad v
            -- LEFT join to pre-query of Bongegs pre-grouped by the art_code which appears
            -- to be basis of all other joins, std deviation and average while at it
            LEFT JOIN 
            (select
                    b.arc_code,
                    count(*) NumInvoices,
                    sum( b.aantal ) verkocht_sd,
                    sum( b.aantal ) / 6.0 verkocht_sdperMonth,
                    avg( b.aantal ) AvgSale,
                    AVG(b.aantal) + 3 * STDDEV( b.aantal) AvgStdDev
                from
                    bongegs b
                        JOIN artcred ac
                            on b.art_code = ac.art_code
                            AND ac.vln = 1
                            and ac.cred_nr = 9117
                where
                        -- only for ORDERS ('f') and within last 6 months
                        b.bon_soort = 'f' 
                    AND b.bon_datum > DATE_SUB(CURDATE(), INTERVAL 6 MONTH) 
                group by
                    b.arc_code ) b
                -- result is one entry per arc_code, thus preventing any Cartesian product
                ON v.art_code = b.art_code
    GROUP BY 
        v.art_code
DRapp
  • 47,638
  • 12
  • 72
  • 142
  • Thank you so much for your extensive reply! Really appreciate it. I've edited and added the parts you mentioned, please let me know if I've missed something. Answers to your question in the original post. I did not know where else to put that so if that was wrong just let me know and I will change it. I'll be looking into the temp tables next, thanks for your comments. – Evad Jan 27 '22 at 21:58
  • 1
    @Evad, can you do a dump/sample of the totvrd table as well by editing your question and adding with the others please. Additionally, your AVG() and STDDEV() are against the entire bongegs table, not grouped per specific art_code. Was that an oversight and SHOULD be avg() + stddev() at the per art_code as well? Some products obviously sell more than others and would not want to overstate/estimate needs by the global average. – DRapp Jan 27 '22 at 22:30
  • @Evad, completely revised query. See context and hope it is easier to follow and still gets what you are looking for. – DRapp Jan 28 '22 at 00:49
0

One simplification:

    and  b.aantal < ( SELECT * from ( SELECT  AVG ...

-->

    and  b.aantal <                 ( SELECT  AVG ...

A personal problem: my brain hurts when I see RIGHT JOIN; please rewrite as LEFT JOIN.

Check you RIGHTs and LEFTs -- that keeps the other table's rows even if there is no match; are you expecting such NULLs? That is, it looks like they can all be plain JOINs (aka INNER JOINs).

These might help performance:

b:  INDEX(bon_soort, bon_datum, aantal, art_code)
totvrd:  INDEX(art_code)
artcred:  INDEX(vln, cred_nr, art_code)

Is b the what you keep needing? Build a temp table:

CREATE TEMPORARY TABLE tmp_b
    SELECT ...
        FROM b
        WHERE ...;

But if you need to use tmp_b multiple times in the same query, (and since you are not yet on MySQL 8.0), you may need to make it a non-TEMPORARY table for long enough to run the query. (If you have multiple connections building the same permanent table, there will be trouble.)

Yes, 5.5.33 is rather antique; upgrade soon.

(pre

Rick James
  • 135,179
  • 13
  • 127
  • 222