1

I have the following tables:

customers

  • cus_id (primary key)
  • driver_licence_id (int UNIQUE)

cars

  • car_id (primary key)
  • cus_id (foreign key to customer and suggests that this customer owners this car)
  • date_created (datetime)

parts (different parts of a car, eg, engine, spark plugs etc)

  • part_id (primary key)
  • manufacturers_id (int)

parts_in_car (Which parts are in the car)

  • part_in_car_id (primary key)
  • car_id (foreign key to car table)
  • part_id (foreign key to parts table)

I am essentially trying to grab a concatenated string of all the manufacturers_id which is in a car which is owned by a specific driver_licence_id. The following SQL query works as I want it to however it takes over 1 second to execute. I have over 1 million total rows. The query I tested with results 20 rows.

SELECT GROUP_CONCAT(p.manufacturers_id ORDER BY p.manufacturers_id) as mids
FROM car c INNER JOIN
     parts_in_car pic
     ON c.car_id = pic.car_id JOIN
     parts p
     ON pic.part_id = p.part_id JOIN
     customers cus
     ON c.cus_id = cus.cus_id 
WHERE cus.driver_licence_id = 5555555
group by c.car_id, c.date_created
ORDER BY c.date_created

I tried doing the following indexes for the sole purpose of this query. Can someone tell me what indexes to create.

# Customer
CREATE INDEX customer_driver_licence_id_idx
ON customer (driver_licence_id);

# cars
CREATE INDEX cars_cus_id_idx
ON cars (cus_id);

# parts
CREATE INDEX parts_manufacturers_id_idx
ON parts (manufacturers_id);

# parts_in_car
CREATE INDEX parts_in_car_part_id_idx
ON parts_in_car (part_id);

CREATE INDEX parts_in_car_car_id_idx
ON parts_in_car (car_id);

Update: The problem is the group by and I already indexed (car_id, date_added) to try solve the problem

#EXPLAIN SELECT
+-------+-------------------------------------+
| table |                 key                 |
+-------+-------------------------------------+
| a     | cus_id                              |
| o     | cars_cus_id_car_id_date_created_idx |
| pip   | parts_in_car_car_id_idx             |
| p     | PRIMARY                             |
+-------+-------------------------------------+
James
  • 299
  • 1
  • 12

4 Answers4

1

This probably requires two sorts:

    group by  c.car_id, c.date_created
    ORDER BY  c.date_created

This would give you similar results faster:

    group by  c.date_created, c.car_id
    ORDER BY  c.date_created, c.car_id

because it can now do a single sort for both steps.

Please provide the EXPLAIN SELECT.... Meanwhile, I will guess that the Optimizer would prefer to start with the only table with filtering:

cus:  (driver_licence_id, cus_id)
c:    (cus_id, date_created, car_id)
pic:  (car_id, part_id)  -- This, or the swapped version could be the PK
p:    (part_id, manufacturers_id) -- probably useless, since part_id is PK

Each is a "covering index", thereby allowing all the work to be done in a INDEX BTrees. Note: The difference in index for c (compared to Gorden's suggestion) may or may not actually help. I based mine on the modified GROUP BY.

With simple JOINs (same as INNER JOIN), the Optimizer almost always starts with the table for which there is filtering (WHERE...). After that, the order of the tables in your query is forced by the ON clauses. So, it was relatively straightforward to decide on the 4 indexes needed.

In other situations, where it is not obvious what order is best for doing the joins, some of the indexes may need flipping.

In particular, if you remove the WHERE, the optimal starting point would be an index on c starting with the two columns in the GROUP BY. That would probably eliminate a sort. Next would come either cus or pic. p would come after pic.

Many:Many

Is parts_in_car a "many-to-many" table? If so, get rid of the PK that you have; it hurts performance. See this for more discussion: http://mysql.rjweb.org/doc.php/index_cookbook_mysql#many_to_many_mapping_table

Rick James
  • 135,179
  • 13
  • 127
  • 222
  • I think you are right that it does 2 sorts. I created every possible combination of indexes for `car`. I made 11 indexes (plus the `outfit_id` index which is created by default as primary key). It uses the index `(av_id, outfit_id, date_created)`. `ORDER BY c.date_created` is the problem. I changed the order by to `ORDER BY c.date_created, c.car_id`. Fixed the problem – James Aug 09 '20 at 03:07
  • Just one question though. In the end, is the main `ORDER BY c.date_created, c.car_id` sorted by `date_created` or `car_id`. Because the main thing for me is that it just needs to be sorted by date – James Aug 09 '20 at 03:13
  • I cannot change the group by, because it needs to be grouped in that specific order. First car_id then date_created – James Aug 09 '20 at 03:16
  • @James - Think of `GROUP BY` and `ORDER BY` each combining the columns specified, then using the combination to do the grouping and sorting. Think of the `ORDER BY` like of a list ordered by `lastname, firstname`. It's the pair, not one or the other. – Rick James Aug 09 '20 at 03:17
  • @what you have is date ordered, with the cars for a single date in some, possibly random, order. What is have is date ordered, with the cars in id order within each date. That is, I kept your ordering and your grouping. But mine specifies what to do with the secondary ordering by car. – Rick James Aug 09 '20 at 03:19
  • What if 2 `car_id`'s have the same datetime. To my understanding, if we group by the `car_id` we can never go wrong. But if we group my datetime first, then there could be problems if datetime is the same – James Aug 09 '20 at 03:24
  • The explanation of the sql statement is here. Where outfit is car. items_in_outfit is parts_in_car. https://stackoverflow.com/questions/62990223/query-concatenates-everything-into-one-row-rather-than-into-separate-rows – James Aug 09 '20 at 03:27
  • What about `EXPLAINs`, _with all columns_ (except maybe "possible keys") for the various suggestions in the Answers. There are often surprise clues. – Rick James Aug 09 '20 at 03:33
  • I made a couple of minor comments on my index suggestions. – Rick James Aug 09 '20 at 03:43
0

For this query:

SELECT GROUP_CONCAT(p.SELECT GROUP_CONCAT(p.manufacturers_id ORDER BY p.manufacturers_id) as mids
FROM car c INNER JOIN
     parts_in_car pic
     ON c.car_id = pic.car_id JOIN
     parts p
     ON pic.part_id = p.part_id JOIN
     customers cus
     ON c.cus_id = cus.cus_id 
WHERE cus.driver_licence_id = 5555555
group by c.car_id, c.date_created
ORDER BY c.date_created 

You want indexes on:

  • customers(license_id, cus_id)
  • car(cus_id, car_id, date_created)
  • parts_in_car(car_id, part_id)
  • parts(part_id, manufacturers_id).
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
0

The index strategy depends on your workload (i.e. most executed queries) and the level of cardinality of your columns (i.e. duplicated values in columns).

Indexes are used by all your workload (i.e queries executed). They are not meant for just some specific queries. The reason for this is that if you have too many queries the engine may choose indexes incorrectly for some other queries and cause slowness for them while helping to improve that one query you had in mind when designing the query. If that one query is one your most executed queries, then it worth the trade-off of course.

The engine will prefer indexes that have low cardinality compared to cardinality in the table or other indexes. So, when you are combining multiple columns into an index (composite index), you need to consider how much each column increases the cardinality of the index.

https://www.mysqltutorial.org/mysql-index/mysql-index-cardinality/

Make sure you have clustered index on your tables (i.e. primary key) and make it single column numeric primary key (as opposed to composite primary key or non-numeric primary key). This primary key will be automatically included all your indexes. The engine need some way of mapping from your indexes to the tables and primary key serves this purpose inside the secondary indexes. If you don't have a primary key, the engine will try find some uniqueness in the columns and thus less efficiency.

https://www.mysqltutorial.org/mysql-index/mysql-clustered-index/

Covering indexes are the one that includes all columns from a table necessary for the execution of the query. When there is covering index, the engine will use it instead table b/c it has all the columns to complete the query. Even if the cardinality of the covering index is the same as the table, this may still be preferred because since it has less number of columns than the original table, the index size will be small (thus less I/O).

https://blog.toadworld.com/2017/04/06/speed-up-your-queries-using-the-covering-index-in-mysql

So, without knowing all these parameters, it is hard to design your indexes. I can suggest the following covering indexes to test but if any the column is increasing cardinality dramatically, add another index without that column.

### Assuming `cust_id`, `car_id`, `part_id` are primary keys 
### Assuming `(car_id, part_id)` is the composite primary key of `parts_in_car` 
### (with the assumption that it is always used as many-many relationship table) 
### (thanks to @rick-james for pointing out many-to-many PK tip) 

# Customer
CREATE INDEX customer_driver_licence_id_idx
ON customer (driver_licence_id);

# cars
CREATE INDEX cars_cus_id_idx
ON cars (cus_id, date_created);

# parts
CREATE INDEX parts_manufacturers_id_idx
ON parts (manufacturers_id);
K4M
  • 1,030
  • 3
  • 11
  • Here is cookbook on generating indexes: http://mysql.rjweb.org/doc.php/index_cookbook_mysql – Rick James Aug 09 '20 at 03:12
  • Int primary key -- a many:many table is a common example of where a composite, possibly varchar, primary key is much better than an artificial INT. This comment applies to `parts_in_car`. Notice how all the answers suggest a composite PK for that table. – Rick James Aug 09 '20 at 03:24
  • @RickJames you're right on "parts_id_car" if it is a always used as "many-to-many mapping" table (i.e. joins used all of its columns that makes up the composite key --- if partially used, that's another story). updated my answer – K4M Aug 09 '20 at 03:42
0

Could it be faster with subselects?
Here some pseudocode:

select manufacturer_id from manufacturer_parts where part_id in (select part_id from part_car where car_id in (select car_id from cars where driverlicense = ?)

it-person
  • 98
  • 1
  • 11