0

I notice a pattern which seems pretty obvious now. Need to get your opinion on this.

Suppose have a One To Many relationship from table 1 to table 2 in a relational model. For example table 1 could be a User table and table 2 could be a Login table which logs all user logins. One user can log in multiple times. Given a user we can find all logins by that user.

The first idea that comes to mind will be to store the logins only in the login table. This is design one.

But if for some usecases we are interested in a particular login of the user (say the last login) it is "generally a good idea" to cache the last login time in the user table itself. Is that right?

Design 2 is obviously redundant as we can always find the last login time by performing a join and then discarding all but the previous logins.

For one user either should be fine. But if you want to find last login time for all the users a SQL query then design 1 would involve a join and a subquery to filter out the unneeded results.

But given our usecase it is a good idea to store last login time in the user table itself which will save us from the join. Is that right?

Is that a generic pattern that you see when designing schemas?

Rohit Banga
  • 18,458
  • 31
  • 113
  • 191
  • I think a current session id is often used to keep track of this. – Aiias Mar 10 '13 at 05:21
  • I just used this as an example, I am referring to the generic one to many design principles. Also, I want to make this question community wiki. – Rohit Banga Mar 10 '13 at 05:24
  • You are evaluating performance aspects in the design phase. Keep in mind that [premature optimization is the root of all evil](http://en.wikipedia.org/wiki/Premature_optimization#When_to_optimize) – A. Rodas Mar 10 '13 at 05:43
  • You can't answer this question so early in design, listen to A. Rodas here. Finish your design _then_ do your perf evaluations using relevant [test data](http://www.generatedata.com). If you find an issue then re-evaluate and make a change if you need to. – Just Aguy Mar 10 '13 at 05:49

1 Answers1

0

You are confusing the concepts of TABLE and RELATION, a common mistake. You have two RELATIONS in your conceptual model (Users & Logins), but in practice this will involve more than two TABLES in your physical model, as non-clustered indices are nothing more than additional TABLES to speed-up the joining of multiple RELATIONS.

Once the INDEX (UserID, LoginTime) exists on Logins to support the FK relationship to Users, the query to find the most recent login for a user is covered by the non-clustered index. Only when a known, measurable, severe performance problem has been identified with this default model would one look to denormalize, as this (like all denormalizaions) introduces a performance hit for EVERY OTHER READ AND WRITE operation on the denormalized table.

Pieter Geerkens
  • 11,775
  • 2
  • 32
  • 52
  • hmm ... interesting ... may be i am missing the point ... but when I query for the recent logins for all users i will generate a query that looks like: `SELECT user.id, login.login_time FROM USER user, LOGIN login WHERE user.id = login.user_id AND login.login_time >= (SELECT login_time from LOGIN where user_id=user.id)`. Is that correct? How will a non-clustered index speed up this query? Will the following query be faster in the presence of this index? `SELECT U.id, L1.login_time FROM USER U1, LOGIN L1, LOGIN L2 WHERE U1.id=L1.user_id AND U1.id=L2.user_id AND L1.login_time >= L2.login_time`. – Rohit Banga Mar 10 '13 at 19:45
  • I guess there are some problems with these queries. Need to figure out how to fix them. Assuming this join is efficient with 'non-clustered index', I will need to make sure hibernate generates these indices for me when using hbm2ddl and also make sure my JPQL translates to the more efficient SQL query. In some cases I have seen that just creating a redundant column made the final data analysis much faster. Although I did not create the relevant non-clustered indices. – Rohit Banga Mar 10 '13 at 20:12
  • The query "select UserName, u.UserID, LastLoginTime = Max(LoginTIme) from Users u left join Logins l on u.UserId = l.UserID group by u.UserID,UserName" will be coverd by a non-clustered index. – Pieter Geerkens Mar 11 '13 at 01:54
  • Thanks.First off my specific usecase has different table names but that should not matter. Equivalently I have different device types (Desktop, Mobile, Other) in the LOGIN table and I want to get the last login times for all users for each of the device type, I just group by u.UserID, U.DeviceType. I create a BTREE index (UserID, LoginTime, DeviceType). USING HASH also creates a BTREE index for some reason. It gives me the right result. But there seems to be no performance improvement. MySQL command line reports ~0.31 seconds with or without that index. The query returns 43000 rows. – Rohit Banga Mar 11 '13 at 18:02