1

I'm building a star schema to allow reporting against usage of an application, but have hit a stumbling block on two points:

  1. Some answers can be free text.
  2. I need to report on what questions were not answered by what users.

To elaborate:

Application has a section where user answers questions.

A question can have the following types of answers:

  • Multiple choice (radio buttons), "choose 1 out of 4."
  • List item.
  • Multiple choice (tick box), "choose one or more of the following."
  • free text (text box), "what are your thoughts on...?", or used if "other" radio button is chosen.

I was considering a questions dimension, and an answers dimension. The problem with this is the free text will pollute the answers dimension with lots of answers that are specific to only one user on one question.

This model also means that there is a row in the fact table for each answer for every user, which I don't think is ideal - but I can live with this if necessary. The fact table is mostly just to link the other tables together. Fact table will be roughly: Fact_Table_ID, UserID, QuestionID, AnswerID, Is_Answered (boolean, yes/no). This is all a way to see which questions are being answered, and which are not being answered.

I will have to report on questions that have not been answered by users as well as what has been answered, and I don't know how to address this in a star schema.

I'd appreciate some suggestions on how best to approach this.

Jo Douglass
  • 2,055
  • 1
  • 19
  • 30
wilson_smyth
  • 1,202
  • 1
  • 14
  • 39
  • What is in your Fact table? It's not very clear from what you've said here - could you edit your post to explain its structure and how you envision it joining to your Question and Answer dimensions? – Jo Douglass Jun 03 '15 at 16:22
  • the fact table is mostly just to link the other tables together. Fact table will be roughly: Fact_Table_ID, UserID, QuestionID, AnswerID, Is_Answered (boolean, yes/no) Thats about it.this is all a way to see what questions are being answered and what are not being answered. I do however see them wanting to drill down into individual answers and thats where the freetext & multiple choice gets tricky. – wilson_smyth Jun 03 '15 at 20:49

1 Answers1

0

While I will still give an answer for your question, I feel I have to start by questioning whether this data really belongs in a star schema. Some data isn't a natural fit for a dimensional model - might this be such a case? Have a think about the way the data is going to be used, and before you go too far down this path, decide whether it makes more sense to hold this data in a different format.

Ralph Kimball sometimes discusses having an "ODS" which is a non-dimensional data store. This can be used for certain reports and queries which can't be fulfilled by your transactional system, but don't belong in the dimensional data marts or data warehouse for whatever reason.


That said, if you decide to build this in a dimensional model, my advice would be to put the free-text answers into a separate dimension from the standard answers. Then people can analyse the standard answers without needing to reference an over-sized answer dimension, but can still get to those free text answers when needed.

Rather than the Is_Answered bit column, you could have a row in your answers dimension which means "no answer given" - i.e. treat no answer as an answer itself. Whenever you find a question which someone hasn't answered in the ETL, you'd look up that particular row in the answer dimension.

Jo Douglass
  • 2,055
  • 1
  • 19
  • 30
  • Thanks Joe. I was thinking that myself, that it was more a standard relational model than a star schema, but really wanted it to fit into a star. It may simply not be the case. Il spend more time on this and update the thread when i have a conclusion. hopefully it might help guide someone else with a similar issue – wilson_smyth Jun 04 '15 at 12:45