0

I want to store about 100k rows of data, and all data some common field. All data have a category and other fields is base on category.

For example if data is in category 1, It had extrafield1 and extrafield2

I search and found two way for storing data.

1-Name value pair

Table1
ID    Name     Category   Field2           Field3
1     Name1    1          Value            Value
2     Name2    2          Value            Value

Table2 
ID    Table1_ID         Name           Value
1     1                 extrafield1    1
2     1                 extrafield2    2
3     1                 extrafield3    3
4     2                 extrafield4    4
5     2                 extrafield5    5

2-Parent Child table

Table1
ID    Name     Category   Field2           Field3
1     Name1    1          Value            Value
2     Name2    2          Value            Value

Tableforcategory1 
ID    Table1_ID         extrafield1    extrafield2     extrafield3
1     1                 1              2               3


Tableforcategory2 
ID    Table1_ID         extrafield4    extrafield5
1     2                 4              5   

So my question is when use method 1 and when use method 2.

Ali Akbar Azizi
  • 3,272
  • 3
  • 25
  • 44
  • If you use Postgres you could use a single table with a column for each attribute that belongs to all categories and one `hstore` (key/value) column to store the extra attributes –  Jan 10 '16 at 15:46

2 Answers2

1

Method 2 is generally preferred for a variety of reasons:

  • It more closely models the entities represented by the different categories.
  • It allows for the columns to have different data types.
  • It makes it easier to implement check constraints for value-only columns.
  • It makes it easier to implement foreign key constraints for reference columns.
  • It makes it easier to implement unique constraints, should these be appropriate.
  • It makes it easier to implement not-NULL and default values.
  • It makes it easier to add columns on specific attribute values.

And there may be other reasons.

The first method -- which is called entity-attribute-value modeling (EAV) -- is definitely an alternative. It is mostly suitable in two situations:

  • The number of attributes exceeds the column limit in the database being used.
  • The attributes are sparsely populated, so only a few are in use for any given entity.

Sometimes a hybrid of these two methods is appropriate, with commonly used attributes being stored in a relational format and sparse attributes stored as EAV.

There are alternative approaches, such as storing the values in a JSON or XML object. These are not generally recommended, but might be suitable in some databases under certain circumstances -- particularly when all attributes need to be treated as a single block and returned and set together.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • I have another question, If i have only 4 category and I need use all extra field use method 2 ? and if I have a lot of category use method 1 ? – Ali Akbar Azizi Jan 10 '16 at 15:21
  • In general, you shouldn't use EAV unless you ave a good reason for choosing it. – Gordon Linoff Jan 11 '16 at 02:58
  • if I have a lot of category, in method 2 i should create a lot of table, Is it good reason to use EAV ? – Ali Akbar Azizi Jan 11 '16 at 12:42
  • @CooPer . . . Each category would be a separate *row* not a separate *column*. Don't confuse entities and attributes. – Gordon Linoff Jan 11 '16 at 13:07
  • All fields is base on category. if `category = 1` then i have extrafield1 extrafield2 , if `category=2` i have extrafield3 ... . and in method 2, i must create a table for each category. is it good reason to use method 1 when i have a lot of category ? – Ali Akbar Azizi Jan 11 '16 at 14:05
0

It depends on the type of queries and the stability of the data model.

If your queries are essentially static, meaning you know when you are going to use "extrafield_x", then method 1 is simpler and more efficient, but less flexible.

If you need more dynamic queries and in time you might more categories and more "extrafields", method 1 is more flexible, no data model maintenance required, but more complex to use and probably slower.

koriander
  • 3,110
  • 2
  • 15
  • 23