0

I have 3 university source databases from which I'm loading the data into my data warehouse dimension tables. Since all 3 sources will be in the same dimension tables, how do I know which rows are from which university? For example; if I wanted to query the transcript of one student from a specific university over multiple semesters, in my fact table (fact_transcript) I would be able to distinguish using a student unique social security number (ssn) or I could add a university attribute to the fact table (but i don't think this is correct) but for my course table and others; if all 3 universities number their course ids the same (e.g 1, 2, 3..) i wouldn't know which courses belong to which university so that I could load the students and course to the fact table correctly. I would have something like this for the course dimension:

COURSE

course_id | c_name | cr_hours ...

1 | from_university_1 | 3
2 | from_university_1 | 3
3 | from_university_1 | 3
1 | from_university_2 | 3
2 | from_university_2 | 3
3 | from_university_2 | 3
1 | from_university_3 | 3
2 | from_university_3 | 3
3 | from_university_3 | 3

The names could be the same or different, I just chose those to show you what I mean. Would I transform the IDs to range from 1-1000 for uni1 then 1001- 2000 for uni2 and so on or would I attach an abbreviation before each course name (and do the same for other dimensions) like; UNI1-coursename , UNI2-coursename? Thank you for any help.

moe
  • 1
  • in the step that you load the dim in, add one more column to hold university name is ok? – Phung Duy Phong Feb 25 '20 at 02:36
  • @PhungDuyPhong so would I do that for all of my dimension tables? Is this good practice? Thank you. – moe Feb 25 '20 at 03:47
  • Yes it is, in my company (implement DWH for large business), always includes source column in dimension (if you need), but most of the time it is a separate dimension for fact table. In your case I think university in fact table is ok right? Since it make senses to be additional dimension (as starschema) – Phung Duy Phong Feb 25 '20 at 04:16
  • Also transform ID in range from 1-1000 for uni 1 will probably caused you trouble in the future, as for business user, they always want to slice data on university info, and computation on every filter might be slow for large fact table – Phung Duy Phong Feb 25 '20 at 04:17
  • Dont take my word as only because I work for solution on large company, every project I do for each company, we always depends on users usage to determine the design to make the best performance, so. – Phung Duy Phong Feb 25 '20 at 04:18
  • @PhungDuyPhong thank you. I'll implement it as you described and see how it works out. – moe Feb 25 '20 at 14:39

0 Answers0