0

I'm new to star schema and I'm confused about what variables to include in it. Say I have a dataframe of some movies, columns include info on director, actors, ratings, reviews, genres, etc. If I want to make a star schema, should I try to include all the every column? Can someone please explain this to me? Thank you.

efsee
  • 579
  • 1
  • 10
  • 22
  • What are your requirements? You should include the things your customers /users will require. If you don't know what that is then no one here can help you. – nvogel Oct 10 '17 at 06:28
  • You seem to be asking for a tutorial on star schema. There are tutorials out on the web. If you want to know how to analyze the data, look into multidimensional data modeling. – Walter Mitty Oct 10 '17 at 11:30

1 Answers1

2

So many ways to answer this question, because it depends on your development organization and receiver of this solution :)

For example, you could include only attributes that are important to the business processes that you are supporting. In a sales data mart you would probably include the sales representative but exclude his shoe size. Well, unless the company sells shoes...

You could include only attributes that you can reliably test and verify upfront. This may seem rigid, but depending on your organization it may save you a lot of support work...

You can include only attributes that are specifically requested by the user community. This way there is always a log of what information is available, why it was made available and who requested it.

But I think it is universally a bad idea to include everything you have just because you have it.

Obviously you will use some combination of the above depending on your organization.

Ronnis
  • 12,593
  • 2
  • 32
  • 52
  • Thanks for the answer. If I just want to build a database now for query and search later, and I don’t know what kind of problems I will be solving, in that case, should I include everything then? – efsee Oct 10 '17 at 13:32
  • Normally we build stuff because we need it, so normal rules doesn't apply here :) On a more serious note, start with one fact table that supports the most important business process and go from there – Ronnis Oct 10 '17 at 15:33
  • Start with understanding the difference between a fact and a dimension, than begin your design with the simplest event you want to model. – Wes H Oct 23 '17 at 18:26