0

I have 2 datasets in csv file that has to be imported into database: Clients and Income.

  • Table Clients has CUSTOMER_ID,....(other columns)
  • Table Income has INCOME_ID,CUSTOMER_ID....(other columns)

I dont know how should i organise tables and relations between them, as CUSTOMER_ID in Clients is not unique (there are data for the same client for different dates, so it has something like this: Clients table

The same is for table Income - multiple INCOME_ID's and CUSTOMER_ID's

Should i make CUSTOMER_ID as a PK or should create one more column? I ve read that PK should have only unique values. Please help, i am an intern and i dont have experience in it.

I am using SQLITE

Jane
  • 1
  • 1
  • What is the purpose of the table Clients? A table named Clients would typically contain 1 row for each client. – forpas Jun 27 '21 at 11:33
  • It contains historical data, so every time client changes its data (for exemple surname) new row is added – Jane Jun 27 '21 at 11:37
  • Then this table should be renamed to something like Clients_History. Create a table Clients with only 1 row for each client which will contain the client's unique id and any other column that contains info about the client's current status. Then CUSTOMER_ID from Clients_History and Income will both reference the id of Clients. – forpas Jun 27 '21 at 11:42
  • I have the same problem with other tables, that i wanted to relate with each other. Should i create table named for example DATA with CUSTOMER_ID, INCOME_ID, LOAN_ID ... ect? and how should i organise it? all as integers + additional column as PK (for example DATA_ID)? Or all as primary keys(is it possible?) , if it wont be PK/FK i wouldnt be able to set relations – Jane Jun 27 '21 at 11:52

1 Answers1

0

A Primary Key is unique by definition; that's just what the word "key" means here. So your question really becomes "do I need a Primary Key, and how would I add one?"

To answer that, you need to learn about "database normalisation", which is a set of rules for structuring data to ensure that each "fact" is stored only once. One of the key ideas in those rules is identifying which columns are "dependent" on which other columns.

For instance, in your Clients table, you presumably have data that is different between different rows for the same CustomerId (aside: shouldn't that be ClientId?) but which can be distinguished in some way. Those columns depend on both the CustomerId and some other column (version? date?), so the natural Primary Key is the combination of those two columns. A Primary Key with multiple columns is perfectly valid, but can be unwieldy to work with, so it's common in practice to add a "surrogate key", which is an arbitrary ID for every row in the table.

Where normalisation comes in is that you might also have columns that are always the same for the same CustomerId - maybe you have the date they first signed up, for instance. It wouldn't make sense to have multiple sign-up dates for the same client, so that column would depend only on the CustomerId column. Normalisation rules tell you that that belongs in a table where CustomerId is a unique column.

As you'll notice, normalisation is not something you can do one table at a time - it affects how you name things, and what columns you create - so you need to broaden your scope and plan out the data you need to store.

IMSoP
  • 89,526
  • 13
  • 117
  • 169
  • Yes, you perfectly caught my question ! :) I have different data for the same client and it has different date, all of the row are needed and should be added to the table. But I dont really understand what do you mean by surrogate key and how can i do it in sqlite? Do you mean that i have to add one more column (for example Client_tableid)? If yes, I understand that Client_tableid should be primary key and CustomerId (or better ClientId) is not a key? I tried to to that, but having CustomerId as not a key, I cannot set relation between tables – Jane Jun 27 '21 at 12:14
  • @Jana As I say in the last paragraph, you can't look at one table at a time, you have to design your entire schema. Referencing a ClientId from another table currently makes no sense - every time you joined, you'd get multiple rows. You need a separate table that *does* have one row per ClientId, and to decide what data belongs there. – IMSoP Jun 27 '21 at 12:30