1

According to data warehouse concepts, is it correct to have a relation (1-m or even m-m) between two fact tables?

Twetter scenario would be an example. We can suppose that we have two Fact tables (Tweets and Users). If we want to know which user has tweeted a tweet or the tweets were made by a use, we have to join this two fact tables. So is it a (m-m) relationship between Fact-tweets and Fact-user? Or there is another way to structure this issue?

Ziad Salem
  • 496
  • 13
  • 34

2 Answers2

2

No, it's not ok to have relations between fact tables directly. You can only relate them via shared dimensions.

In your Twitter scenario, User is not a fact, it's a deminsion. Then you will have 1:m relations between Dim User and Fact Tweets, and analysis is streighforward:

Count(Tweets) By User
RADO
  • 7,733
  • 3
  • 19
  • 33
  • Well, you are right, that user is a dimension (and I have it already in my schema), but the example is more complicated and long to write it all in the question. We have this tweeter example as a training project of a course at my College. There are some measures about the users. Therefore, we have a dimension and a fact table for users. One example of a measure in the fact users. Tracking changes of followers number in an hourly window between two different tweets of the user. – Ziad Salem Mar 26 '18 at 19:55
  • RADO is correct. You can't relate events without a dimension between them. In your example, tweets by the change in followers would be meaningless without a User dimension to tie the events together. The users&followers fact table should be at the grain of the individual follower, which can then be aggregated by hours (another dimension). – Wes H Apr 02 '18 at 14:48
  • To clarify my comment the users&followers fact table should capture a follower subscribing to follow a user. That is the event that occurred and the grain at which it should be captured. – Wes H Apr 02 '18 at 15:03
  • I also do not agree to have a relation between two facts. I have already the user dimension as well. @WesH You are right since you supposed that the follower measure is viewed by the time dimension. However, that would not work in my scenario, because the task is to track the followers' number between different tweets. e.g. User "A" tweeted (TweetID: 10001) and its followers' number was (120). this user tweeted again (tweetID: 10009). Now the followers' number is (125). – Ziad Salem Apr 03 '18 at 15:21
  • Tracking followers' number of users is only an example. There are other tasks has the same scenario where we have to track changes according to tweets. – Ziad Salem Apr 03 '18 at 15:21
  • Can we say that it is not possible to structure the data warehouse according to this scenario and should these measures tracked by time rather than tweets? – Ziad Salem Apr 03 '18 at 15:26
  • This is a common usage of a data warehouse, and DWs are ideally suited for this type of analytics. I'll post an answer in a bit as the response is too long for a comment. – Wes H Apr 03 '18 at 15:29
2

No. You do not relate facts to each other. Every column in a fact table should be either a FK to the related dimension record, or be an intrinsic value of the event.

fFollowers should relate to User(Tweeter), Date(FollowDate), Time(FollowTime), User(Follower). You'll also need CancelDate/Time, which should point to the Unknown/Future date dimension record if the Subscription is active. The default measure would be count.

dTweet would likely be a degenerate dimension, tying the attributes that are not related to a dimension and are not a measurable value, such as the tweetId(the key) and the lat/long of where the tweet was sent. This dimension may not be necessary.

fTweet should relate to User(Tweeter), Date(TweetDate), Time(TweetTime), and dTweet if you deem that dimension is necessary. Count would be a measure. You may also have text length as a measure.

In your business problem. you want to count the number of followers at the time a tweet occurred. You'll need to write a measure expression for fTweet that retrieves fFollowers.Count where TweetDate/Time between FollowDate/Time and CancelDate/Time. I would name this RecipientCount and it would be a measure of the tweet. If this measure is on fTweet, when you slice by dTweet.ID, it should return see the number of recpients of a tweet.

Wes H
  • 4,186
  • 2
  • 13
  • 24