Determine completeness of data supplied to RDBMS in multiple stages

Question

Currently, I am writing an application that collects data in multiple stages. Suppose the entity: person. I might collect a portion of the data in one stage: name, address. And in some later state I might collect the supplementary the data: age, gender. The data collected can vary for each stage.

I really prefer using non-nullable columns to ensure data availability, which applies to most of these columns. However, this seems rather difficult in the described situation, since to be created requires all non-null columns to contain data. (In addition, some columns should be nullable in some situations in the future).

My current plan
Make every column nullable and programmatically decide if all columns which are ought to be not null are actually not null. Possibly, this state could be saved in an extra column such as is_complete: boolean.

My question
I am mostly curious to other references regarding this problem: has this or a similar problem been encountered before and how was it approached? Generally, I am looking for alternative approaches to my current plan or wondering what issues I could encounter in this proposal.

I plan to use postgres or mysql. Approaches to eighter DBMSs are welcome. I have been considering a nosql approach due to its somewhat unstructured nature but would rather not use it. May it be relevant: I will likely be using a ORM for managing my entites (typeorm)

Any advice is greatly appreciated.

The columns you describe basically contain text. Why not put an empty text in columns for which you don't have a value yet? Basically, in MySQL: `NOT NULL DEFAULT ''`. I think that even just `NOT NULL` would do the same. It is possible you want to know which columns actually have received a value, in that case the NULL value can be useful. — KIKO Software, Feb 10 '22 at 23:56
@KIKOSoftware Thanks for your reply and for mentioning it. The data _should_ be able to support any native datatypes in the regarding DBMS. I edited my question to reflect this by changing `phone` to `age`, which could be a `INT`. I will keep an eye open for an empty string (`''`) as a solution, but generally would like to avoid this approach. For `INT`, this would probably mean applying `DEFAULT 0`, which depending on the column's context could have different meaning. — sander, Feb 11 '22 at 00:03
The problem with the ```is_complete``` column is located in adding additional columns. Then you have to update all rows. I check all needed columns, if I need it and I suppose this is a common approach. — Mario, Feb 11 '22 at 06:37
@Mario That is something I did not think about, thank you! I was also thinking about making `is_complete` a virtual column. — sander, Feb 11 '22 at 11:33

Determine completeness of data supplied to RDBMS in multiple stages

0 Answers0