I'm framing this intentionally as a general question, even though I'm using Django's ORM to implement these tables. Here are my Django models if you want to have a peek.
The Riddle
I have several tables with the following relationships:
Survey <--m2o-- Page <--m2o-- Category <--m2o-- Question <--m2o-- Choice <--?-- Response
| | | | |
|- name |-number |- name |- sortid |- sortid
|- sortid |- text |- text
|- text |- short
|- qtype
Where m2o
is short-hand for many-to-one, each table has an implicit primary key, and my question is with regard to how to organize the response table.
The Response table should have some kind of relation out to a User table (depending on the table definition, this might be one-to-one or many-to-one). The purpose of the table is to store user responses to particular questions. The problem I'm having is that depending on question type, I would need to have the response stored in a different column type in the Response table. For example a table might resemble:
| user | (question) | (qtype) | choice | response |
|------+------------+-----------------+-----------+-------------------------------|
| 1 | Q1 | Select One | A | False |
| 1 | '' | '' | B | False |
| 1 | '' | '' | C | True |
| 1 | .. | '' | D | False |
| 1 | Q2 | Select Multiple | 1 | True |
| '' | '' | '' | 2 | True |
| '' | '' | '' | 3 | False |
| '' | '' | '' | 4 | True |
| '' | Q3 | Long Text | NULL | "It was the best of times..." |
| '' | Q4 | Select Explain | A | False |
| '' | '' | '' | B | False |
| '' | '' | '' | C | False |
| '' | '' | '' | D (other) | True |
| '' | '' | '' | Explain | "I actually prefer bananas." |
I've added columns for clarity that would be implicitly defined by relationships, and therefore not present in the actual table. They're denoted with ( )
. Also, choice
would be actually be a foreign key.
It would appear that the response column might need to be a text column. Breaking it out into two or more columns like response-text
and response-boolean
feels messy. We could easily have hundreds of rows for a single user.
I've also thought about organizing the table like this:
| user | survey | response (key-value store) |
|------+--------+--------------------------------------------|
| 1 | 1 | {"q1": "C", "q2": "1,2,4" |
| | | "q3": "It was the best of times..." |
| | | "q4": "Other: I actually prefer bananas."} |
| 2 | 1 | {...} |
| 3 | 1 | {...} |
How would you define this table, and why?
Bonus points if you explain in what contexts your definition would not work well.
If you'd like to imagine my situation, here are some notes on my context:
- The response table might be updated a few times during the survey, but never afterward. This particular table is OLAP, or read-mostly. The only queries I'll be doing are to export the table for statistical analysis in a commercial software package.
- Responses from different users will rarely be identical (from a high level)
- Performance isn't an issue in the sense that only 15 users will be simultaneously connected to the application during a survey administration session. Only one user (me) will be looking at the data after it is stored.
I suspect my situation is close to the example of N1NF on Wikipedia.
Should this table be polymorphic?
I've just heard of multiple-table inheritance, or polymorphism, and I'm not sure how to apply it to this problem. Specifically, I'm not sure which tables would benefit from inheritance. For example, should Response extend Choice? Should Page extend Survey?