What is the best way to design a database to store record with a lot of values?

Question

I want to design a database for events and track a lot of statistic about the it.

Option 1

Create one table for Events and put all my statistic column in it. Like number of male, number of female, number of unidentified gender, temperature that day, time it started, any fights, was the police called, and etc.

The query would be a very simple select * from events

Option 2

Create two tables, one for Events and one for EventsAttributes. In the Events table I would store important stuff like id, event title, and start/end time.

In EventsAttributes I would store all the event statistic and link them back to Events with a eventId foreign key.

The query would look like below. (attributeType == 1 would represent number of males)

select e.*, (select ev.value from EventAttributes ev where ev.eventId = e.id and attributeType = 1) as NumberOfMale from Events e

The query would be not be as straight forward as option 1, but I want to design it the right way and live with the messy query.

So which option is the right way to do it, and why (I'm not a database admin, but curious).

Thank you for your time.

Option 1 sounds like a typical way to deal with this. Option 2 sounds like EAV (entity attribute value) which is an evil anti-pattern that should be implemented very rarely. Think of a table as a way to hold all properties of an entity, in this case an event. I might consider some additional tables for things that might happen more than once during an event (like police calls, injuries etc). — Sean Lange, Jun 19 '17 at 21:18
In the Option 2, you forgot the third table, AttributeTypes. If you hardcode this lookup into your app, that would be an evil thing indeed. — Roger Wolf, Jun 20 '17 at 04:18
@SeanLange For things that might happen more than once, I plan to represent them by number. So for police call, the column would be an `int` type. Thanks for the input. — binary_assemble, Jun 20 '17 at 11:40
So you would only record the number of times the police were called? You don't need any additional details about those calls? Things like what times they were called etc... — Sean Lange, Jun 20 '17 at 13:16

M.Hassan · Answer 1 · 2017-06-20T12:44:26.453

0

I prefer using option 2 for designing database.

In that option(2), you apply the best practice of database normalization.

There are three main reasons to normalize a database:

The first is to minimize duplicate data.
The second is to minimize or avoid data modification issues
The third is to simplify queries.

For more details, read Designing a Normalized Database

You can create views (queries) based on this normalized database to support Option (1).

In this way, database will be ready for any future scaling.

Update:

You can use the the valuable operator pivot and common table expressions (CTE) to get eventAttributes1, eventAttributes2, ...

Suppose your tables are :events and event_attributes as described below:

    events
   ----------
    # event_id
    event_title
    start_date
    end_date

    event_attributes
    -------------
    #event_id
    #att_type
    att_value

    # is primary key

    -- using table expression (it's like a dynamic view)

    with query as (
    select e.event_id, e.event_title,a.att_type, a.att_value
    from events e
    join event_attributes a on e.event_id =a.event_id
    )
    select event_id , event_title,
    [1] as eventAttributes1, -- list all eventAttributes1 numbered [1],[2],...
    [2] as eventAttributes2
    [3] as eventAttributes3
    FROM query  
    PIVOT(SUM(att_value) FOR att_type IN ([1],[2],[3])) as pvt

For details on pivot read: Using PIVOT

For details Using Common Table Expressions

edited Jun 20 '17 at 12:44

answered Jun 20 '17 at 00:28

M.Hassan

10,282
5
65
84

I prefer option 2, but how would you efficiently query the two tables so the data are combined like, `id, eventName, eventAttributes1, eventAttributes2, ...` – binary_assemble Jun 20 '17 at 11:36
Option 2 is an EAV which can get incredibly complicated very quickly. This is not something I would recommend for somebody who is relatively new to the database world. Yes I absolutely agree that normalization is the best way to go but I don't think this situation needs an EAV. – Sean Lange Jun 20 '17 at 13:18
@Sean Lange , Any one new to the database world, sooner will be expert and database will be complex to be normalized in the future with huge data :). – M.Hassan Jun 20 '17 at 14:06
Well we don't know how much data we are dealing with here. And by no means did I ever suggest they don't normalize their database. EAV is a horrible pattern in almost every situation. Very rarely should it be used and almost never a total EAV solution. A hybrid EAV portion is generally the best approach. – Sean Lange Jun 20 '17 at 14:14
I agree, To EAV, or not to EAV? Choosing your data model.https://wq.io/1.0/docs/eav-vs-relational – M.Hassan Jun 20 '17 at 14:39
@SeanLange My estimate would be around 110 columns if I combine the Events information and the Event Attributes. – binary_assemble Jun 20 '17 at 20:45
That sounds like an awful lot of columns to me. I suspect some normalization would make this a lot cleaner. – Sean Lange Jun 20 '17 at 21:05
@SeanLang. So, for 110 columns,normalization (or EAV) fit for purpose. Is it right? – M.Hassan Jun 21 '17 at 15:11
Not quite sure your question. But 110 columns sounds like it needs to be normalized. One way to accomplish that might be by a partial EAV but that would be a last resort and not something I would do unless the data really needs that. Since we still don't know what the OP is really storing it is impossible for us to know. – Sean Lange Jun 21 '17 at 15:15

What is the best way to design a database to store record with a lot of values?

1 Answers1