0

I have a table

name: order with id, product_id, comment

now I want to add a state

new table: order_state 1 -> finished 2 -> started

etc

then add a field order_state_id in the table order

in what way do I have to worry about performance?

does this always perform well or what is the case where it wont? e.g. i mean when doing joins etc with a lot of orders, say 200'000 orders

i have used mysql views before and they were horrible the view I created contained obviously several joins. Is this not a related problem?

Community
  • 1
  • 1
Toskan
  • 13,911
  • 14
  • 95
  • 185
  • 3
    http://stackoverflow.com/questions/2623852/why-are-joins-bad-when-considering-scalability/2623979#2623979 – Joel Coehoorn Sep 29 '16 at 13:33
  • In this particular case, a separate table that stores just order_state which can take only a limited set of values would be wastefull and not high performance. The same data can be saved in the orders column just as easily. The level of redundancy will be exactly the same. – e4c5 Sep 29 '16 at 13:34
  • The order_state_id is a low-cadinality domain. You can implement it as a table (as in your question) or as a `CHECK` constraint, or as a true domain (don't know if mysql supports these), or as an enum (ditto). In all cases the storage requirements will typically be sizeof int, the domain-table version will probably imply a a supporting index for the foreign key to speed up cascades. Performance should not be a problem (the low cardinality will enable hash joins/lookups) (if mysql has these) – joop Sep 29 '16 at 13:45
  • @joop - MySQL allows the `CHECK` syntax, but ignores it. – Rick James Sep 29 '16 at 23:21

2 Answers2

1

Not an answer, just too big for a comment

In addition to what have been said, consider partial indexes.

Some DB like Postgres and SQL Server allows you to create indexes that not only specifies columns but rows.

It seems that you will end up with a constant growing amount of orders with order_state_id equal to finished (2) a stable amount of orders with order_state_id equal to started (1)

If your business make use of queries like this

SELECT id, comment
  FROM order
 WHERE order_state_id =  1
  AND product_id = @some_value

Partial indexing allows you to limit the index, including only the unfinished orders

CREATE INDEX Started_Orders
          ON order(product_id)
       WHERE order_state_id = 1

This index will be smaller than the unfiltered contra part

Horaciux
  • 6,322
  • 2
  • 22
  • 41
0

Don't normalize order_state. Instead add this column

order_state ENUM('finished', 'started') NOT NULL

Then use it this way (for example):

SELECT ...
    WHERE order_state = 'finished'
    ...

An ENUM (with up to 255 options) takes only 1 byte. INT takes 4 bytes. TINYINT takes 1 byte.

Back to your question... There are good uses of JOIN and there are unnecessary uses.

Rick James
  • 135,179
  • 13
  • 127
  • 222
  • http://komlenic.com/244/8-reasons-why-mysqls-enum-data-type-is-evil/ i stopped reading at "3. It's impossible to add additional attributes or related info." because this would already have been a big killer for me before this – Toskan Sep 30 '16 at 01:22
  • If you need to code a client application that has a Combobox to select order_state. What is the best approach to filling it with values without the need of a recompile if a new state (like 'Canceled') is added? – Horaciux Sep 30 '16 at 01:23
  • @Toskan - You can have another table with that enum as the `PRIMARY KEY`. In it, you can have other column(s) with 'related info'. Adding a new option _used to be_ a costly option. Now it is almost instantaneous, regardless of table size. – Rick James Sep 30 '16 at 06:07
  • @Horaciux - I haven't done a "recompile" in 15 years. PHP, Perl, Java, VB, etc don't have a visible "compile" phase. Are you using C#, C++, or some other old language? Still, it can be 'automated', see next comment... – Rick James Sep 30 '16 at 06:10
  • With the extra table or with the information_schema, you can discover the list of "order_states" and programmatically build the Combobox. – Rick James Sep 30 '16 at 06:11
  • `information_schema.COLUMNS.COLUMN_TYPE` is `"enum('GREEN','YELLOW','RED')"` in one example I found. (OK, this I_S approach is rather messy.) – Rick James Sep 30 '16 at 06:16
  • There are 3 approaches - ENUM, TINYINT, VARCHAR. Each has pros and cons. You have to decide. (My answer is merely advocating for one of them.) – Rick James Sep 30 '16 at 06:18
  • @RickJames that would mean duplicating the Enums, right? so if i want to rename yellow to say yellow/orange i would have to do it at two places – Toskan Sep 30 '16 at 17:34
  • True, that's one of the drawbacks to `ENUM`. Enum is a 'bad' choice when there are hundreds of options or they change a lot. – Rick James Sep 30 '16 at 19:20