0

I have a database with the data of many experiments. These experiments contain some generic attributes, but also some experiment-specific attributes. As a small example, the database can look like this:

TABLE 1:

ID attr1 attr2 exp_type more_attrs
1 x y type1 1
2 x2 y type1 2
3 x z type2 1

type1:

ID extra_attr1 extra_attr2
1 a c
2 ab c

type2:

ID other_attr1 other_attr2 other_attr3
1 d e f

Where all the experiments are stored in a general table, and have some specific extra information based on the experiments.

Currently, when I want to join these tables on a query, I fetch the exp_type and merge on a table based on that value. But as well supported joining on primary keys is, this way of working feels not as intended..

As there could always come new experiment types, adding a column for a reference to every possible type is not really an option.

So my question basically is: Is this the way to go? Are there better ways to reference tables? Or is the structure of my database just badly designed? And what would be a better design?

  • 1
    Kinda opinion based, but I would avoid 'horizontal' structures (attr1, attr2 ... attrN columns)) - create table for attributes, another table for experiment attributes relation, which contain links (FK) to experiments table and attributes table (and maybe some other information, related to specific experiment and attribute combination). – Arvo Jun 18 '21 at 09:24
  • If I understand correctly you mean an attributes table where every row conains a PK, an attribute name and an attribute value? And how would you store the attribute names in this case? – DwightFromTheOffice Jun 18 '21 at 09:41
  • Two tables - one for attributes (PK, name, description + possibly type, kind etc) (at least for generic ones, but I would include all possible specific ones OR one attribute for specifics - depend on real needs, like applicabilty of specific attributes to multiple experiments) and another for attribute values for experiments (linked to experiments and attributes). IF you need some kind of "templates" for different experiment types, then more tables may be needed. – Arvo Jun 18 '21 at 09:52
  • Thanks for the explanation, since this would solve my problem (seems more like the intended way) and I think this is more generally applicable I would be happy to accept an answer with this solution. – DwightFromTheOffice Jun 18 '21 at 09:57

2 Answers2

1

If you want to be able to add new types, then you have a problem with your data modeling. Basically, you have to modify all queries that refer to experiments to include the new types.

Of course, you could encapsulate this logic in a view and update the view every time you add (or modify) a type.

There are a lot of considerations when trying to model this.

One alternative is an entity-attribute-value (EAV) model. This basically means storing the attributes on separate rows. However, this has a few downsides, such as:

  • The value tends to be a string. It is complicated to support different types.
  • The values are in separate rows, which is inconvenient.
  • Searching for particular combinations of values for a given experiment is a little complicated.

I can think of two alternatives to this approach. One is to store all the values in a single table, modifying the table for new types. Unused values would be NULL.

Another alternative is to store the additional attributes within each row, typically using JSON (but there may be other methods as well).

The JSON approach works pretty well if most of the information is in common.

The original approach using separate tables can be quite useful if other tables have foreign key references to a particular type. That is, there are different approaches. The best method depends on the particular situation.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
1

I would avoid any kind of 'horizontal' structures (attr1, attr2 ... attrN columns) - this makes data management pretty hard - and there will be always some attribute, which doesn't fit into existing structure and you need to create more columns etc etc.

I would create two tables - one table for attributes (PK, name, description + possibly type, kind etc) (at least for generic ones, but I would include all possible specific ones OR one attribute for specifics - depend on real needs, like applicabilty of specific attributes to multiple experiments) and another (relation) table for attribute values for experiments (linked to experiments and attributes).

If you need some kind of "templates" for different experiment types, then more tables may be needed.

Think about Gordons answer too - depend on other constraints (data insertion-updating-deletion logic, history management, analyzis needs, client application design, DBMS used) JSON solution could be more "compact" alternative.

Arvo
  • 10,349
  • 1
  • 31
  • 34