0

Looking for inputs on a data warehouse schema design. Here is the scenario:

I have an Action Table and User Table that is currently joined based on UserId to get the details about the user who did an action.

Action Table:
    UserId   Action
    123      Test001

User Table:
    UserId    UserName
    123       Adam

Now, we have to migrate the users to a new user management system (UMS) and here is how it works:

  1. Existing users will be migrated to UMS and new UserId (let's call this as a Modern UserId and existing UserId as Legacy UserId) will be assigned. So, the new records coming-in for new actions will carry the new UserId.
  2. New users created in UMS will only have Modern UserId and Legacy UserId will be run.
  3. Migrated users will have both Legacy UserId and Modern UserId.

Now, when we do reporting, we have to expose both historic and new action data. Wondering what should be the ideal schema design so we can report both historic & new actions and map them to the right user.

Platform: SQL Server 2016, Analysis Services

Please let me know if you need more details.

WhatsUp
  • 73
  • 1
  • 7
  • details around the migration of userids would be useful, the technical details will flush out if you can use a mapping table, or if you can alter schema to add a column in the new table for the Legacy UserID. Sounds like you have a mapping but I'm not sure from your number 2, how does this process work is there room for the legacyid in the table or is this going to be a secondary process – Random_User Feb 07 '18 at 21:59
  • Migration is being done by my upstream so I do not have more details. However, I have the flexibility to alter the schema for both action table and user table and perform any lookup as required during our ETL process. There is no mapping table currently but I would like to avoid mapping table and see if we can add a column like UniqueAgentId on both Action table and User Table so the join can be done on that single column but not sure whether this is the best approach in terms of implementation, maintenance and performance. – WhatsUp Feb 07 '18 at 22:23
  • 1
    It's just another source system providing another attribute - add another column to the User table. You will need some kind of mapping somewhere to know which record to attach it to – Nick.Mc Feb 07 '18 at 23:04
  • Thanks Nick. I want to avoid another JOIN so is there an approach i can take without using a mapping table. – WhatsUp Feb 08 '18 at 04:31
  • Databases are full of joins. How are you going to match the id's between the system without a mapping table? – Nick.Mc Feb 10 '18 at 08:37

2 Answers2

1

You have not given us any additional information as to the details of how this is going to be done, so the sql-server tag doesn't really help us here. This is more of a modeling question.

When you speak of a new id for a column there has to be some way during the creation of said key to ensure integrity, that process will dictate to some degree the methods you have to provide the solution.

The User table looks to be a table of unique values and where the creation of the 'Modern Key' is created, if you can edit this table, you should add the 'legacy key' here. This becomes your mapping table, mapping tables do not have to be a separate object.

Random_User
  • 363
  • 1
  • 7
  • Agreed, when your upstream team performs the migration to UMS, they should preserve in some way a mapping between Legacy User IDs and Modern User IDs. Downstream in the warehouse, I suggest that you keep both IDs in your User dimension table but generate a surrogate key in this table which will serve as a primary key (it can be just an incremental integer). This way you can use the surrogate key as a foreign key in your Action fact table whether the User is a modern or legacy one. – Alexis.Rolland Feb 08 '18 at 14:31
0

Agreed with the previous answer. When your upstream team performs the migration to UMS, they should preserve in some way a mapping between Legacy User IDs and Modern User IDs. Downstream in the warehouse, I suggest that you keep both IDs in your User dimension table but generate a surrogate key in this table which will serve as a primary key (it can be just an incremental integer). This way you can use the surrogate key as a foreign key in your Action fact table whether the User is a modern or legacy one.

Here is my proposal of data model design for your tables:

DIM_USER
- USER_KEY (pk)
- USER_ID
- USER_ID_LEGACY
- USERNAME
- ....

DIM_ACTION
- ACTION_KEY (pk)
- ACTION
- ....

FACT_ACTION
- USER_KEY
- ACTION_KEY
- ....
Alexis.Rolland
  • 5,724
  • 6
  • 50
  • 77
  • Thanks, I finally did an approach similar to what you have mentioned. I used ISNULL(ModernUserId, LegacyUserId) as the key column in the user dimension table. – WhatsUp Mar 19 '18 at 02:57