1

I have a MySQL database with a fairly large table where the products are. Each of them has its own id and categoryId field where there is a category id belongs to this product. Now I have a query that pulls out products from given categories such as:

SELECT * FROM products WHERE categoryId IN ( 1, 2, 3, 4, 5, 34, 6, 7, 8, 9, 10, 11, 12 )

Of course, come a WHERE clause and ORDER BY sort but not in this thing. Let's say that these products is 250k and the visits are over 100k per day. Under such conditions in the table slow_log registered weight of these queries with large generation time.

Do you have any ideas how to optimize the given problem?

Table engine is MyISAM.

Arkadiusz Kondas
  • 454
  • 8
  • 13

1 Answers1

2

Index on categoryId won't help in this case, IN (...) queries will produce sequence scan instead of index lookup anyway.

I would consider first redesigning the system to get rid of multiple category select and if it is not appropriate, caching query results.

For example, you can create a helper table items_category_groups(hash, item_id) and after client query on multiple categories hash their combined ids and lookup this table. If not found, make an expensive query and fill this table. If found, make a cheap query joining these tables. Other caching tools like memcached will work too.

Andrey
  • 993
  • 1
  • 11
  • 20
  • good idea but this query is used to pulls out product for given category and its subcategories (whole branch of this category) so it is imposible – Arkadiusz Kondas Mar 09 '10 at 10:35
  • Then caching is the only way I see, although the cache size will be large (250k * count of category combinations), so I would store them in DB, not in memcache or something. – Andrey Mar 09 '10 at 11:06
  • If you just want to pull out products from ONE category (and its subcategories), you can create a helper table (category_id, item_id) and fill it with all category_id-item_id pairs including subcategories. This table won't be very large (say, if you have category nesing level of 4, it will contain not more than 1m*number of categories two-int rows, it's not a big deal). Then you will fetch products with a single fast query using indexes. – Andrey Mar 09 '10 at 11:13
  • but this is the same. I will also must use IN on the helper table – Arkadiusz Kondas Mar 09 '10 at 11:29
  • No. For example, you have category 1 with subcategory 2 and 2 has a subcategory 3. Item 1 belongs to category 3. Then you will have 3 rows for one item in this table: 1 1 | 1 2 | 1 3 And if you want to select all items from, say, category 2, you write, roughly: SELECT items.* FROM items i, item_categories ic WHERE i.id=ic.item_id AND ic.category_id=2; And it will be equal to SELECT * FROM items WHERE category_id IN (2, 3); Of course, if you want to select items from two not related categories, this won't work, but in case of nested categories it will work just fine. – Andrey Mar 09 '10 at 11:55
  • after my reflection: when i want add product ,for example, to category 3 i must put in item_categories 3 rows ? 1 1 | 1 2 | 1 3 ? If yes the save proccess will take longer and i must generate tree for each save. – Arkadiusz Kondas Mar 09 '10 at 12:45
  • sure, but this is just one time on save, and even these 3 queries may work faster than one select with IN(). If you have much more selects than inserts, I guess this is worth it. – Andrey Mar 09 '10 at 13:30