-1

We have Purchase order management data as mentioned below. The order comes from different sources Retail, Wholesale, and online. Assume the addition of source is infrequent. Assume we don't have information on source ids as it is coming from a different system. it is just a plain text

PURCHASE_ID RETAIL_ID WHOLE_SALE_ID ONLINE_ID ITEM_ID QUANTITY AMOUNT PURCHASE_DATE
P1001 R384894 NULL NULL IT541 45 450 11/12/2022
p1002 NULL W6479 NULL IT541 50 500 15/12/2022
P1003 NULL NULL 035822 IT541 10 100 20/02/2022
P1004 37828 NULL NULL IT679 500 1500 18/06/2023
P1005 NULL W64757 NULL IT230 23 679 18/06/2023
P1006 NULL NULL O5363 IT222 67 4784 18/06/2023

A purchase ID refers to a purchase from a single source. For example, PURCHASE_ID P1001 is the purchase done by RETAIL STORE ID R38494 so the other two columns(WHOLE_SALE_ID, ONLINE_ID) are null. ITEM_ID is the ITEM number being ordered. We are having 10 million records in the database and the table is partitioned by PURCHASE_DATE.

I can think of two DB Schema designs options. Would like to know which is the correct way to design the table for faster data retrieval and performance. Please let me know other options if any

Option 1: Three Columns that allow NULLS with needed indexes

Purchase:

PURCHASE_ID (Primary Key)
RETAIL_ID 
WHOLESALE_ID
ONLINE_ID 
ITEM_ID (Foreign Key referencing Items table)
QUANTITY
AMOUNT
PURCHASE_DATE


Items:

ITEM_ID (Primary Key)
ITEM_NAME
Other item-related information
Indexes:

Index on PURCHASE_DATE column in the Purchase table.
Index on ITEM_ID column in the Purchase table.
Indexes on  columns RETAIL_ID , WHOLESALE_ID , ONLINE_ID

enter image description here

Option 2: Composite ID with needed index

Purchase:

PURCHASE_ID (Primary Key)
SOURCE_TYPE_ID (Foreign Key referencing SourceTypes table)
SOURCE_ID 
ITEM_ID (Foreign Key referencing Items table)
QUANTITY
AMOUNT
PURCHASE_DATE

SourceTypes:
SOURCE_TYPE_ID (Primary Key)
SOURCE_TYPE_NAME (e.g., "Retail", "Wholesale", "Online")

Items:

ITEM_ID (Primary Key)
ITEM_NAME
Other item-related information
Indexes:

Index on PURCHASE_DATE column in the Purchase table.
Index on ITEM_ID column in the Purchase table.
Indexes on composite columns (e.g., SOURCE_TYPE_ID, SOURCE_ID) in the Purchase table.

enter image description here

  • 1
    I assume you are gathering this data for central analysis/reporting. So you might want to consider a schema that lends itself to analysis, since you already need some degree of data engineering to get the data in. If two of your three ID columns will always be NULL, then I would suggest putting all of these values into one column, and adding a seperate additional field to indicate what channel it was sold through (Retail, Wholesale, Online). Could I ask that you change your pictures to a textual description (I can't see pictures on my corporate network) – Nick.Mc Aug 15 '23 at 06:11
  • [Strategy for “Which is better” questions](https://meta.stackexchange.com/q/204461) [Why was my "shopping list" question closed?] [ask] [Help] (https://meta.stackexchange.com/q/158809/266284) [Why should I not upload images of code/data/errors?](https://meta.stackoverflow.com/q/285551/3404097) An ERD is an image of DDL. When an image is appropriate don't use transparency, SO has both white & black background modes. – philipxy Aug 15 '23 at 07:00

0 Answers0