6

I need help in database design.

I have following tables. DB schema

Pseudo code:

Table order_status {
  id int[pk, increment]
  name varchar
}

Table order_status_update {
  id int[pk, increment]
  order_id int[ref: > order.id]
  order_status_id int[ref: > order_status.id]
  updated_at datetime
}

Table order_category {
  id int[pk, increment]
  name varchar
}

Table file {
  id int[pk, increment]
  order_id int[ref: > order.id]
  key varchar
  name varchar
  path varchar 
}

Table order {
  id int [pk] // primary key
  order_status_id int [ref: > order_status.id]
  order_category_id int [ref: > order_category.id]
  notes varchar
  attributes json  // no of attributes is not fixed, hence needed a json column
}

Everything was okay, but now I need an auto-increment id for each type of order_category_id column.

For example, if I have 2 categories electronics and toys , then I would need electronics-1, toy-1, toy-2, electronics-2, electronics-3, toy-3, toy-4, toy-5 values associated with rows of order table. But it's not possible as auto-increment increments based on each new row, not column type.

In other words, for table order instead of

  id  order_category_id
---------------------
  1       1       
  2       1       
  3       1     
  4       2       
  5       1      
  6       2
  7       1

I need following,

 id  order_category_id pretty_ids
----------------------------
  1       1       toy-1
  2       1       toy-2
  3       1       toy-3
  4       2       electronics-1
  5       1       toy-4
  6       2       electronics-2
  7       1       toy-5

What I tried:

I created separate table for each order category (not an ideal solution but currently I have 6 order categories, so it works for now )

Now, I have table for electronics_order and toys_order. Columns are repetitive, but it works. But now I have another problem, my every relationship with other tables got ruined. Since, both electronics_order and toys_orders can have same id, I cannot use id column to reference order_status_update, order_status, file tables. I can create another column order_category in each of these tables, but will it be the right way? I am not experienced in database design, so I would like to know how others do it.

I also have a side question.

Do I need tables for order_category and order_status just to store names? Because these values will not change much and I can store them in code and save in columns of order table.

I know separate tables are good for flexibility, but I had to query database 2 times to fetch order_status and order_category by name before inserting new row to order table. And later it will be multiple join for querying order table.

--

If it helps, I am using flask-sqlalchemy in backend and postgresql as database server.

Jashwant
  • 28,410
  • 16
  • 70
  • 105
  • 2
    "*now I need an auto-increment id for each order_category*" - which table are you talking about, `order`? And then you would make `order_category` and `id` the primary key together? And add extra columns in the `file` and `order_status_update` tables so that you can reference the composite key? Doesn't sound like a good idea. At all. – Bergi Nov 13 '19 at 20:48
  • I think what you are really looking for is a query like `SELECT name || '-' || ROW_NUMBER() OVER (PARTITION BY order_category_id ORDER BY id ASC)` – Bergi Nov 13 '19 at 20:53
  • "*Do I need tables for order_category and order_status just to store names?*" - you could also use enums for them, but there's nothing wrong with storing the names. "*I had to query database 2 times to fetch their ids by name before inserting new row to order table*" - you can do that in a single `INSERT` statement with nested queries. Alternatively, closer to the enum solution, assign them fixed ids and just pass those. – Bergi Nov 13 '19 at 20:56
  • 1
    Alternatively you could have a trigger ```ON INSERT``` manipulating the inserted row (adding the "-n" to the end). – Islingre Nov 13 '19 at 21:19
  • @Bergi, can you explain your query ? May be write an answer ? I don't need multiple insert because `order_category` and `order_status` tables are already filled. I just need to get `order_category_id` and `order_status_id` by passing their name, so I can insert in `order` table without passing fixed ids. Fixed ids are confusing in code. – Jashwant Nov 14 '19 at 05:12
  • @Islingre, will it work in race condition ? Can you write an answer with code ? – Jashwant Nov 14 '19 at 05:12
  • @Jashwant I mean like `INSERT INTO orders VALUES (DEFAULT, (SELECT order_status_id FROM order_status WHERE name = $1), (SELECT order_category_id FROM order_category WHERE name = $2), $3, '{}');` – Bergi Nov 14 '19 at 10:51
  • But it is like a join. Isnt it ? Is it worth having separate table and add this complexity rather than storing names in code and storing values in column ? – Jashwant Nov 14 '19 at 12:02
  • 1
    @Jashwant I guess you might need to serialize it.. But if you would additionally create a ```UNIQUE``` constraint on ```(id, order_category)``` this should work race conditions as well – Islingre Nov 14 '19 at 17:02
  • 1
    @Jashwant Yes, it is like a join. Don't think there's anything wrong with it. The benefit over storing string values directly in `order` table would be reduced disk size and the ability to change names easily. Not sure what you mean by "*storing names in code*". – Bergi Nov 17 '19 at 00:08
  • 'storing names in code' meant 'storing name and ids in a dictionary / object` in python code' – Jashwant Nov 17 '19 at 06:08
  • 2
    This can be easily achieved using row_number() with partition by in a select statement, but storing it as a part of your table is a bit more tricky. I've answered [a question like this for SQL Server](https://stackoverflow.com/a/45694913/3094533), but I don't know how to do that on postgreSQL. Hope that helps. – Zohar Peled Nov 17 '19 at 07:16
  • I would highly suggest not naming a table `order` – Haleemur Ali Nov 21 '19 at 03:39
  • @HaleemurAli, Because it's a keyword ? How can it cause issue ? I am going to use ORM – Jashwant Nov 21 '19 at 06:59
  • For my opinion model a little bit wrong, because in "order" you should use "order_status_update_id" instead of "order_status_id". And when you insert row to "order_status_update" you update "order_status_update_id" with new id. – Vladimir Nov 21 '19 at 09:26
  • 1
    @Jashwant, yes because it is a keyword. it would need to be quoted in every query that is run. for usage through the ORM this is not bad, however anytime a data analyst / engineer has to interact with this data, it will be a little pain point. If you are using the ORM, the actual table name does not matter, so you someone else's light better sacrificing no functionality immediately – Haleemur Ali Nov 21 '19 at 16:15
  • @HaleemurAli, Fair enough, I am making table names plural now. – Jashwant Nov 21 '19 at 17:49
  • 1
    I'm sensing X,Y problem. Can you tell us some background why you need this change? A `Category` is a category which can have many Items. An order has items and the items within the order may or may not belong to more than one category. Either your used terminology is an issue or converting each item to it's own category i.e. `Banana -1, Banana-2` will defo be an issue in the future. All bananas should come onder `fruits` category. - I'm just thinking from another perspective. – Krish Nov 22 '19 at 12:13

6 Answers6

5

In order to track the increment id which is based on the order_category, we can keep track of this value on another table. Let us call this table: order_category_sequence. To show my solution, I just created simplified version of order table with order_category.

CREATE TABLE order_category (
  id SERIAL PRIMARY KEY,
  name  VARCHAR(100) NULL
); 


CREATE TABLE order_category_sequence (
  id SERIAL PRIMARY KEY,
  order_category_id int NOT NULL,
  current_key  int not null
);

Alter Table order_category_sequence Add Constraint "fk_order_category_id" FOREIGN KEY (order_category_id) REFERENCES order_category (id);
Alter Table order_category_sequence Add Constraint "uc_order_category_id" UNIQUE (order_category_id);


CREATE TABLE "order" (
  id SERIAL PRIMARY KEY,
  order_category_id int NOT NULL,
  pretty_id  VARCHAR(100)  null
);

Alter Table "order" Add Constraint "fk_order_category_id" FOREIGN KEY (order_category_id) REFERENCES order_category (id);

The order_category_id column in order_category_sequence table refers the order_category. The current_key column holds the last value in order.

When a new order row is added, we can use a trigger to read the last value from order_category_sequence and update pretty_id. The following trigger definition can be used to achieve this.

--function called everytime a new order is added
CREATE OR REPLACE FUNCTION on_order_created()
  RETURNS trigger AS
$BODY$

DECLARE 
current_pretty_id varchar(100);

BEGIN

-- increment last value of the corresponding order_category_id in the sequence table
Update order_category_sequence
set current_key = (current_key + 1)
where order_category_id = NEW.order_category_id;

--prepare the pretty_id
Select 
oc.name || '-' || s.current_key AS   current_pretty_id 
FROM    order_category_sequence AS s
JOIN order_category AS oc on s.order_category_id = oc.id
WHERE s.order_category_id = NEW.order_category_id
INTO current_pretty_id;

--update order table
Update "order"
set pretty_id = current_pretty_id
where id = NEW.id;


RETURN NEW;
END;
$BODY$ LANGUAGE plpgsql;


CREATE TRIGGER order_created
  AFTER INSERT
  ON "order"
  FOR EACH ROW
  EXECUTE PROCEDURE on_order_created();

If we want to synchronize the two table, order_category and order_category_sequence, we can use another trigger to have a row in the latter table every time a new order category is added.

//function called everytime a new order_category is added
CREATE OR REPLACE FUNCTION on_order_category_created()
  RETURNS trigger AS
$BODY$

BEGIN
--insert a new row for the newly inserted order_category
Insert into order_category_sequence(order_category_id, current_key)
values (NEW.id, 0);

RETURN NEW;
END;
$BODY$ LANGUAGE plpgsql;


CREATE TRIGGER order_category_created
  AFTER INSERT
  ON order_category
  FOR EACH ROW
  EXECUTE PROCEDURE on_order_category_created();

Testing query and result:

Insert into order_category(name)
values ('electronics'),('toys');

Insert into "order"(order_category_id)
values (1),(2),(2);


select * from "order";

enter image description here

Regarding your side question, I prefer to store the lookup values like order_status and order_category in separate tables. Doing this allows to have the above flexibility and it is easy when we have changes.

Yared
  • 2,206
  • 1
  • 21
  • 30
2

To answer your side question: yes, you should keep tables with names in them, for a number of reasons. First of all, such tables are small and generally kept in memory by the database, so there is negligible performance benefit to not using the tables. Second, you want to be able to use external tools to query the database and generate reports, and you want these kind of labels available to those tools. Third, you want to minimize the coupling of your software to the actual data so that they can evolve independently. Adding a new category should not require modifying your software.

Now, to the main question, there is no built-in facility for the kind of auto-increment you want. You have to build it yourself.

I suggest you keep the sequence number for each category as a column in the category table. Then you can update it and use the updated sequence number in the order table, like this (which is specific to PostgreSQL):

-- set up the tables

create table orders (
  id SERIAL PRIMARY KEY,
  order_category_id int,
  pretty_id VARCHAR
);
create unique index order_category_pretty_id_idx 
  on orders (pretty_id);

create table order_category (
  id SERIAL PRIMARY KEY,
  name varchar NOT NULL,
  seq int NOT NULL default 0
);

-- create the categories
insert into order_category
(name) VALUES
('toy'), ('electronics');


-- create orders, specifying the category ID and generating the pretty ID

WITH 
  new_category_id (id) AS (VALUES (1)), -- 1 here is the category ID for the new order
  pretty AS (
    UPDATE order_category 
    SET seq = seq + 1
    WHERE id = (SELECT id FROM new_category_id)
    RETURNING *
  )
INSERT into orders (order_category_id, pretty_id)
SELECT new_category_id.id, concat(pretty.name, '-', pretty.seq) 
FROM new_category_id, pretty;

You just plug in your category ID where I have 1 in the example and it will create the new pretty_id for that category. The first category will be toy-1, the next toy-2, etc.

| id  | order_category_id | pretty_id     |
| --- | ----------------- | ------------- |
| 1   | 1                 | toy-1         |
| 2   | 1                 | toy-2         |
| 3   | 2                 | electronics-1 |
| 4   | 1                 | toy-3         |
| 5   | 2                 | electronics-2 |

Old Pro
  • 24,624
  • 7
  • 58
  • 106
  • I want toy-1 and toy-2 associated with rows of `order` table not `order_category` table. I have updated my question and added column `pretty_ids` to better explain this. Is your answer still the same and I should store as toy-1324 and toy-9009 ? Also, can you answer my side question ? – Jashwant Nov 16 '19 at 04:34
  • @Jashwant I answered your side question. I still don't understand why you need multiple versions of the 'toy' category, though. I don't understand your changes, either, as the `order` table should have `order_category_id` not `order_category`, and the pretty id should be in the `order_category` table. Either toy-1 and toy-2 are the same and you can/should just use toy, or they are different, and you should give them meaningfully different names, like toy-mechanical and toy-electronic. – Old Pro Nov 16 '19 at 23:23
  • It was `order_category_id` in order table. Fixed it now. Orders of different category are displayed on different pages on front end. On each page , I need sequencial order ids e.g. toy-1, toy-2 , toy-3, toy-4 instead of toy-2 , toy-5 , toy-6 ( as primary id 1, 3, 4 are taken by orders of different order category ). Pretty ids cannot be in `order_category` table. `Order_category` table will have a fixed number of rows. `Order` table has many to one relationship with `order_category` table. – Jashwant Nov 17 '19 at 06:25
  • 1
    @Jashwant Now that I have a better understanding of what you are trying to accomplish, I have updated my answer accordingly. I think I have completely answered your question and side question. If not, please let me know. – Old Pro Nov 19 '19 at 16:19
  • @OldPro Interesting solution in Postgres, you use `update order_category` as a lock, It means that it is really decrease capabilities of DB server and I would mention it in the answer. – Vladimir Nov 27 '19 at 13:33
1

In order to do toys-1 toys-2 toys-3 you should repeat the logic of order_status update, There is no difference between track some status by time or by count. Just in the order_status update it is simpler you just put now() into updated_at for lets say order_category_track you would take last value + 1 or have different sequences respectively category (would not recommend to do like this because it binds database objects with data in the DB). I would change a schema to: enter image description here

In this schema might be in inconsistent state. But in my opinion in your application there are three different entities "order","order_status","order category track" which live their own lives.

And still it is almost impossible to achieve consistent state for this task with out locks for example. This task is complicated by condition that next rows depends on previous what contradicts with SQL.

Vladimir
  • 135
  • 2
  • 7
0

I would suggest to split category into 2-level hierarchy: category (toy, electronic) and subcategory (toy-1, toy-2, electronic-1, etc.):

So you can use column order_subcategory.full_name contain compiled "toy-1" value, or you can create view to make this field on the fly:

select oc.name || "-" || os.number
from order_category as oc 
join order_subcategory as os on oc.id = os.category_id

enter image description here

https://dbdiagram.io/d/5dd6a132edf08a25543e34f8

Regarding your questions "Do I need tables for order_category and order_status just to store names?": It is best practice to store this kind of data as a separate dictionary table. It gives you consistency and reliability. Querying those tables is very fast and easy for RDBMS, so feel free to use it.

Alex
  • 731
  • 1
  • 6
  • 21
  • 1
    Yes this is the way to go and I would rename the table `order_subCategory` to just `Items`, `order_category => itemCategory`. Just that it makes more sense when reading :) – Krish Nov 22 '19 at 12:18
  • You will add calculated data in your RDBMS. This will cause you harm in the future on evolution/maintenance I think. The schema is good I think. But do not put full name in it ;) (but if you are on PG12: maybe you can use generated columns. Less human control, more maintainability. – Jaisus Nov 22 '19 at 14:23
  • As I mentioned, this field can be compiled in view, if that is easier for use. Otherwise populating table with procedure should resolve the issue. – Alex Nov 22 '19 at 14:26
0

I'll focus on only 3 tables you showed: order, order_status and order_category. Creating a new table for a new record is not the right way. As your explanation, I think you trying to use order and order_category tables as many to many relationship. If it's so, the thing you need is a pivot table like this: table relationship

I currently add order_status column in order table, you can add this column one of these tables as your need.
side question: for order_status, if order status is fixed,( like only ACTIVE,INACTIVE and it won't be more values in the future) it would be better to use a column with ENUM type.

0

The easy answer would be to answer directly to your question. But I do not think it is a good thing in this case. So I will do otherwise. I think that maybe the whole conception is wrong.

First things first : clarification of your business needs and assertions.

  • One order can have multiple categories
  • One category can concern multiple orders

  • One order can only have one status at a time but multiple through time

  • One status can be used by multiple orders

  • One order correspond to a file (probably a billing proof)

  • One file concerns only one order

Second : Advices

Third : conception solution enter image description here

This normally is enough to give you a good start. But I wanna have fun a little more :) So...

Fourth : interrogation on needed performance

  • estimation of load per day/month in order (ten million rows per month?)

Fifth : physical solution proposition

The finaaaaaal code-down ! (with the good music)

-- as a postgres user
CREATE DATABASE command_system;
CREATE SCHEMA in_prgoress_command;
CREATE SCHEMA archived_command;
--DROP SCHEMA public;
-- create tablespaces on other location than below
CREATE TABLESPACE command_indexes_tbs location 'c:/Data/indexes';
CREATE TABLESPACE archived_command_tbs location 'c:/Data/archive';
CREATE TABLESPACE in_progress_command_tbs location 'c:/Data/command';

CREATE TABLE in_prgoress_command.command
(
    id bigint /*or bigserial if you use a INSERT RETURNING clause*/ primary key
    , notes varchar(500)
    , fileULink varchar (500)
)
TABLESPACE in_progress_command_tbs;

CREATE TABLE archived_command.command
(
    id bigint /*or bigserial if you use a INSERT RETURNING clause*/ primary key
    , notes varchar(500)
    , fileULink varchar (500)
)
TABLESPACE archived_command_tbs;

CREATE TABLE in_prgoress_command.category
(
    id int primary key
    , designation varchar(45) NOT NULL
)
TABLESPACE in_progress_command_tbs;
INSERT INTO in_prgoress_command.category 
VALUES (1,'Toy'), (2,'Electronic'), (3,'Leather'); --non-exaustive list

CREATE TABLE in_prgoress_command.status
(
    id int primary key
    , designation varchar (45) NOT NULL
)
TABLESPACE in_progress_command_tbs;

INSERT INTO in_prgoress_command.status 
VALUES (1,'Shipping'), (2,'Cancel'), (3,'Terminated'), (4,'Payed'), (5,'Initialised'); --non-exaustive list

CREATE TABLE in_prgoress_command.command_category
(
    id bigserial primary key
    , idCategory int 
    , idCommand bigint
)
TABLESPACE in_progress_command_tbs;

ALTER TABLE in_prgoress_command.command_category
ADD CONSTRAINT fk_command_category_category FOREIGN KEY (idCategory) REFERENCES in_prgoress_command.category(id);

ALTER TABLE in_prgoress_command.command_category
ADD CONSTRAINT fk_command_category_command FOREIGN KEY (idCommand) REFERENCES in_prgoress_command.command(id);

CREATE INDEX idx_command_category_category ON in_prgoress_command.command_category USING BTREE (idCategory) TABLESPACE command_indexes_tbs;
CREATE INDEX idx_command_category_command ON in_prgoress_command.command_category USING BTREE (idCommand) TABLESPACE command_indexes_tbs;

CREATE TABLE archived_command.command_category
(
    id bigserial primary key
    , idCategory int 
    , idCommand bigint
)
TABLESPACE archived_command_tbs;

ALTER TABLE archived_command.command_category
ADD CONSTRAINT fk_command_category_category FOREIGN KEY (idCategory) REFERENCES in_prgoress_command.category(id);

ALTER TABLE archived_command.command_category
ADD CONSTRAINT fk_command_category_command FOREIGN KEY (idCommand) REFERENCES archived_command.command(id);

CREATE INDEX idx_command_category_category ON archived_command.command_category USING BTREE (idCategory) TABLESPACE command_indexes_tbs;
CREATE INDEX idx_command_category_command ON archived_command.command_category USING BTREE (idCommand) TABLESPACE command_indexes_tbs;

CREATE TABLE in_prgoress_command.command_status
(
    id bigserial primary key
    , idStatus int 
    , idCommand bigint
    , change_timestamp timestamp --anticipate if you can the time-zone problematic
)
TABLESPACE in_progress_command_tbs;

ALTER TABLE in_prgoress_command.command_status
ADD CONSTRAINT fk_command_status_status FOREIGN KEY (idStatus) REFERENCES in_prgoress_command.status(id);

ALTER TABLE in_prgoress_command.command_status
ADD CONSTRAINT fk_command_status_command FOREIGN KEY (idCommand) REFERENCES in_prgoress_command.command(id);

CREATE INDEX idx_command_status_status ON in_prgoress_command.command_status USING BTREE (idStatus) TABLESPACE command_indexes_tbs;
CREATE INDEX idx_command_status_command ON in_prgoress_command.command_status USING BTREE (idCommand) TABLESPACE command_indexes_tbs;
CREATE UNIQUE INDEX idxu_command_state ON in_prgoress_command.command_status USING BTREE (change_timestamp, idStatus, idCommand) TABLESPACE command_indexes_tbs;

CREATE OR REPLACE FUNCTION sp_trg_archiving_command ()
    RETURNS TRIGGER
language plpgsql
as $function$
DECLARE
BEGIN
    -- Copy the data
    INSERT INTO archived_command.command
    SELECT *
    FROM in_prgoress_command.command
    WHERE new.idCommand = idCommand;    

    INSERT INTO archived_command.command_status (idStatus, idCommand, change_timestamp)
    SELECT idStatus, idCommand, change_timestamp
    FROM in_prgoress_command.command_status
    WHERE idCommand = new.idCommand;    

    INSERT INTO archived_command.command_category (idCategory, idCommand)
    SELECT idCategory, idCommand
    FROM in_prgoress_command.command_category
    WHERE idCommand = new.idCommand;    

    -- Delete the data
    DELETE FROM in_prgoress_command.command_status
    WHERE idCommand = new.idCommand;    
    DELETE FROM in_prgoress_command.command_category
    WHERE idCommand = new.idCommand;    
    DELETE FROM in_prgoress_command.command
    WHERE idCommand = new.idCommand;    
END;
$function$;

DROP TRIGGER IF EXISTS t_trg_archiving_command ON in_prgoress_command.command_status;
CREATE TRIGGER t_trg_archiving_command
AFTER INSERT
ON in_prgoress_command.command_status
FOR EACH ROW
WHEN (new.idstatus = 2 or new.idStatus = 3)
EXECUTE PROCEDURE sp_trg_archiving_command();

CREATE TABLE archived_command.command_status
(
    id bigserial primary key
    , idStatus int 
    , idCommand bigint
    , change_timestamp timestamp --anticipate if you can the time-zone problematic
)
TABLESPACE archived_command_tbs;

ALTER TABLE archived_command.command_status
ADD CONSTRAINT fk_command_command_status FOREIGN KEY (idStatus) REFERENCES in_prgoress_command.category(id);

ALTER TABLE archived_command.command_status
ADD CONSTRAINT fk_command_command_status FOREIGN KEY (idCommand) REFERENCES archived_command.command(id);

CREATE INDEX idx_command_status_status ON archived_command.command_status USING BTREE (idStatus) TABLESPACE command_indexes_tbs;
CREATE INDEX idx_command_status_command ON archived_command.command_status USING BTREE (idCommand) TABLESPACE command_indexes_tbs;
CREATE UNIQUE INDEX idxu_command_state ON archived_command.command_status USING BTREE (change_timestamp, idStatus, idCommand) TABLESPACE command_indexes_tbs;

Conclusion:

  • In many cases, when you are worried by the disposition of your keys it is because they are not in the good place. Same goes for cars ones! :D
  • Do not take any solution as prophetic solution : benchmark it.
Jaisus
  • 1,019
  • 5
  • 14
  • Client frequently adds / removes fields in order form. Those needs to be saved in order table. These fields are not common for all order categories, some have them, some do not. I cannot increase columns in table again and again. So, I am saving them in a 1 JSON column. It's schema less and hence works for me. Other solution is to create a table for these fields and pass field name and order id to it, but it's postgresql , why not use JSON ? – Jashwant Nov 22 '19 at 13:27
  • 1
    One of the first rule in RDBMS conception is to have atomic data in the database. But if this is data where columns are addable by client I suppose that you do not have other choice ;) (a lot of people put anything in json columns without thinking about normal forms. this is why I made the remark ;) ) – Jaisus Nov 22 '19 at 14:07