0

Scenario: There are 3 kinds of utilization metrics that i have derive for the users. In my application, users activity are tracked using his login history, number of customer calls made by the user, number of status changes performed by user.

All these information are maintained in 3 different tables in my application db like UserLoginHistory, CallHistory, OrderStatusHistory. All the actions made by each user is stored in these 3 tables along with DateTime info.

Now i am trying to create a reporting db that will help me in generating the overall utilization of user. Basically the report should show me for each user over a period:

  1. UserName
  2. Role
  3. Number of Logins Made
  4. Number of Calls Made
  5. Number of Status updates Made

Now i am in the process of designing my fact table. How should i go about creating a Fact table for this scenario? Should i go about creating a single fact table with rows in it capturing all these details at the granular date level (in my DimDate table level) or 3 different fact tables and relate them?

The 2 options i described above arent convincing and i am looking for better design. Thanks.

Vinoth
  • 2,419
  • 2
  • 19
  • 34
  • Based on your description you have a user dimension (with role in it) and you have a fact table at an unknown level of detail with facts logins, calls, statuses. You'll need to explain your problem a little better because I don't see a problem! – Nick.Mc Mar 11 '15 at 10:44
  • Nick.McDermaid - I have updated. – Vinoth Mar 11 '15 at 10:51
  • Are all the facts at the same level of granularity? if so there is absolutely no reason to put them in multiple facts. One objective of DW's is to make querying easier. Why make it harder by needlessly putting them in three fact tables? – Nick.Mc Mar 11 '15 at 10:53
  • Yes they are all at same granular level. But you see the table can grow so huge coz for each granular datetime i have to fill in activities of user and there could be no login activity at all or no call activity and so on.. is that a good practice? – Vinoth Mar 11 '15 at 11:02
  • 1
    So you're saying you need to have records that represent no activity at all? There may be other ways to represent that, for example your user dimension should have active date ranges which can be used to identify periods with no activity. Also it's probably easier to performance tune one huge table then performance tune three slightly smaller tables that need to be full outer joined (or unioned). – Nick.Mc Mar 11 '15 at 13:17
  • is the period day based (i.e. from 1/3/2015 to 31/3/2015) or the users can have a report also using hours (i.e from 1/3/2015 13:15 to 31/3/2015 20:20)? also how often do you query for all three metrics and how often for a single one? – mucio Mar 12 '15 at 22:04
  • @mucio. The reports shows all the metrics in a single row. Though right now the report is daily, i strongly suspect there could be requirements for real time metrics like hourly or even in the range of minutes? This is exactly where i am looking for a better design – Vinoth Mar 13 '15 at 06:02

1 Answers1

2

As rule of thumb, when you have a report which uses different facts/metrics (Number of Logins Made, Number of Calls Made, Number of Status updates Made) with the same granularity (UserName, Role, Day/Hour/Minute), you put them in the same fact table, to avoid expensive joins.

For many reasons this is not always possible, but your case seems to me a bit different.

You have three tables with the user activity, where probably you store more detailed information about logins, calls and Status updates. What you need for your report is a table with your metrics and the values aggregated for the time granularity that you need.

Let's say you need the report at the day level, you need a table like this:

Day        UserID RoleID #Logins #Calls #StatusUpdate
20150101   1      1      1       5      3
20150101   2      1      4       15     8

If tomorrow the business will require the report by hour, the you will need:

DayHour            UserID RoleID #Logins #Calls #StatusUpdate
20150101 10:00AM   1      1      1       2      1
20150101 11:00AM   1      1      0       3      2
20150101 09:00AM   2      1      2       10     4
20150101 10:00AM   2      1      2       5      4

Then the Day level table will be like an aggregated (by Day) version of the second one. The DayHour attribute is child of the Day one.

If you need minute details you go down with the granularity.

You can also start directly with a summary table at the minute level, but I would double check the requirement with the business, usually one hour range (or 15 minutes) are enough.

Then if they need to get more detailed information, you can always drill down querying your original tables. The good thing is that when you drill to that level you should have just a small set of rows to query (like just few hours for a specific UserName) and your database should be able to handle it.

mucio
  • 7,014
  • 1
  • 21
  • 33