0

I'm quite new to BI designing DB, and here some point I do not understand well. I'm trying to import french census data, where I got population for each city. For each city, I have population with different age classification, that can't really relate with each other.

For instance, let's say that one classification is 00 to 20 years old, 21 to 59, and 60+ And the other is way more precise : 00 to 02, 03 to 05, etc. but the bounds are never the same as the first one classification : I don't have 15 to 20, but 18 to 22, for example.

So those 2 classifications are incompatible. How can I use them in my fact table ? Should I use 2 fact tables and 2 cubes ? Should I use one fact table, and 2 dimensions for 1 cube ? But in this case, I will have double counted facts when I'll sum to have total population for a city, won't I ?

This is national census data, and national classifications, so changing that or estimating population to mix those classifications is not an option. And to be clear, one row doesn't relate to one person, but to one city. My facts are not individuals but cities' populations. So this table is like : Line 1 : One city - one amount of population - one code for dim age (ex. 00 to 19 yo) of this population - code (m/f) for the dim gender of that population - date of the census Line 2 : Same city - one amount of population - one code for dim age (ex. 20 to 34) of this population - code (m/f) for the dim gender - date of the census And so it goes for a lot of cities, both gender, and multiple years. Same

I hope this question is clear enough, as english is not my native language and as I'm quite new in DB and BI !

Thanks for helping me with that.

Groflo
  • 17
  • 8
  • I suggest you break the quesiton into two, keep this one for the incompatible ranges and open a new one for the second part. – Victor HDC May 20 '19 at 15:01
  • About the incompatible ranges, is it possible to get the actual ages? If this is not possible there are ways to approximate a unified range but it will not be exact. – Victor HDC May 20 '19 at 15:05
  • Thanks for your 2 answers, I've edited my OP adequately. – Groflo May 22 '19 at 18:09

2 Answers2

0

One possible solution using a single fact table and two dimensions for the age ranges:

1 - Categorical range based on the broadest census, for example:

  • Young 0-20
  • Adult 21-59
  • Senior 60+

You could then link the other census to this dimension with approximate values, for example 18-22 could be Young.

2 -Original age range. This dimension could be used for precise age ranges when you report on a single city, it can also help you evaluate the impact of the overlapping bounds (e.g. how many rows are in the young / 18-22 range?)

Victor HDC
  • 525
  • 1
  • 6
  • 12
0

you can crate one dimention as below

young 1-20 adult 21-59 senior 60+

Classification is young city 1 : 1-20 young city 2 : 4-23

 id  field1   field2         field3          field4  .......

 1   1 year   young_city_1   other           .......
 2   2 year   young_city_1   other           .......
 3   3 year   young_city_1   other           .......
 4   4 year   young_city_1   young_city_2    .......

Now you can report from any item and with any division

i hope it is help you