0

I have a rather troublesome table that looks like this:

EventTimeLog

Id (bigint) | Time (datetime) | LogId (FK to Log tables Id column)

This table outlines the times a Log event occurred. This way to only write one unique log event and times those logs occur can be derived from this table.

Problem is the dates are so specific and these dates can have duplicate values. Ex:

2015-08-03 23:54:58.000 | 1983
2015-08-03 23:54:58.000 | 1934
2015-08-03 23:54:56.000 | 1647

After some time it becomes very difficult to query. Usually around 500k rows or so it's starts to chug, even if I put an index on LogId and Time. By the time I hit the 1mill range and up, queries slow to a crawl...

I do need these specific times so aggregating by the start of the hour or day is not an option. I also doubt implementing a Count columns for duplicate time values will help much as these dates are so granular. The index will still have to jump through all those specific dates which is very slow.

I'm unsure how to make this table more scale-able. Maybe break this out into monthly tables?

As requested, here is the query used that starts chugging

SELECT b.User, b.Ip, b.AccountId, a.Time FROM 
EventTimeLog a 
inner join [Log] b on a.LogId = b.Id
WHERE 
b.UserId = '<param>' AND
a.Time >= '<param>' AND
a.Time <= '<pamam>' 

If the time difference is > 2 days it chugs like crazy. And yes, I have indexes on Log for UserId.

mBrice1024
  • 790
  • 5
  • 25
  • 1
    why this information is not in the log tables? – Paolo Sep 03 '15 at 12:40
  • 2
    500K rows and your query starts chugging, hmmm have you looked at the resource counters? memory and CPU etc ... 500K rows with only 3 columns is not really a big table, a sql server with reasonable resources should be able to handle it fairly nicely. – M.Ali Sep 03 '15 at 12:42
  • Can you show us the query that becomes so slow? – Thorsten Kettner Sep 03 '15 at 12:43
  • Added the slow query – mBrice1024 Sep 03 '15 at 13:25
  • Maybe you must tunne your queries and indexes (hard to say without more info) the only this I can say for sure is: logs tends to increase fast but also loss meaning just as fast and if it is rarely used you even don't need to bother increasing it queries performance. First keep in mind your requirements: why do you have a log and how much it's usefull, second: can you slice it in a day-by-day and just keep fresh rows ready to use? – jean Sep 03 '15 at 13:43
  • I'm with @Paolo on this. It seems over normalized to have a logTime table, especially when it appears to be a one to one relationship. Why not just have a LogTime column in your logging table? What would help here is if you could post the actual ddl and indexes for both tables. – Sean Lange Sep 03 '15 at 13:46

1 Answers1

0

There are many times in your table that differ only slightly. This would make an index huge and not really helpful.

So consider a computed column instead giving you a less precise time. Then create an index for logid + this computed column.

I don't know what time spans you usually query. Let's use hours for an example. You'd truncate your date to the hour (e.g. dateadd(hour, datediff(hour, 0, time), 0) or as a string: substring(convert(varchar(25), time, 120), 1, 13)):

The new column (I use a string here):

alter table mytable add comp_hour as substring(convert(varchar(25), time, 120), 1, 13);

So you'd get for instance:

time                            comp_hour
2015-09-03 14:12:10.2158145     '2015-09-03 14'
2015-09-03 14:45:27.4457813     '2015-09-03 14'

The index:

create index index_comp_hour on mytable(logid, comp_hour);

The query:

select l.user, l.ip, l.accountid, e.time 
from log l
join eventtimelog e on e.logid = l.id and e.comp_hour in ('2015-09-03 13', '2015-09-03 14')
where l.userid = 123;

(I am not sure though, whether the index mytable(logid, comp_hour) or mytable(comp_hour, logid) would be better, or if it even matters. You can just create both and then look at the execution plan and remove the one that is not used.)

Thorsten Kettner
  • 89,309
  • 7
  • 49
  • 73
  • Sorry it took so long to get back to you. This seemed to do the trick. Added a new TimeAggregate column for this. Then dropped and remade an index on LogId and TimeAggregate and also did an Include on the Time column. Cut time down drastically! – mBrice1024 Sep 05 '15 at 19:44