7

I have a large table (> 50m rows) which has some data with an ID and timestamp:

id, timestamp, data1, ..., dataN

...with a multi-column index on (id, timestamp).

I need to query the table to select all rows with a certain ID where the timestamp is between two dates, which I am currently doing using:

SELECT * FROM mytable WHERE id = x AND timestamp BETWEEN y AND z

This currently takes over 2 minutes on a high end machine (2x 3Ghz dual-core Xeons w/HT, 16GB RAM, 2x 1TB drives in RAID 0) and I'd really like to speed it up.

I have found this tip which recommends using a spatial index, but the example it gives is for IP addresses. However, the speed increase (436s to 3s) is impressive.

How can I use this with timestamps?

Roger
  • 71
  • 1
  • 1
  • 3
  • * Define high end machine? Really... 50 million rows is now a large table by todays standards. * So, the table has ID - where is the timestamp? * What indices? Missing index plus crappy hardware (high end + normal discs) could lead to those results. – TomTom Mar 17 '10 at 12:00
  • Dual 3GHz Xeon Dual-cores with HT, 16GB RAM etc. The table has (id, timestamp, data) with a multi-column index on (id, timestamp). – Roger Mar 17 '10 at 12:07
  • 1
    NOT high end by any means.... espeically as you dont say anything about the..... discs, which are the critical part for any database server. MUCH mor ethan the CPU, or the RAM. What is your disc layout? – TomTom Mar 17 '10 at 12:14
  • It is a desktop PC, not a server, and will only ever be used by one person at once. Disc-wise, it has 2x 1TB drives in RAID 0. – Roger Mar 17 '10 at 12:18
  • Ah, here we go. Crap IO performance. Typical 1TB drives, dsktop, are 7200 RPM. Raid 0 makes it a LITTLE better, but... well... double crap still is crap. I would not say this is up to a fast 50.000.000 row scan if a table scan is decided upon ;) – TomTom Mar 17 '10 at 12:20
  • How selective is X - i.e. how many rows do you expect things to have? How many different ID's are there? – TomTom Mar 17 '10 at 12:21
  • The hardware is out of my control though. I'd hoped that something similar to the IP scan (with a clever index, as shown in the link) would maybe squeeze out some more performance. * X varies between a few thousand and 10's of millions, but is usually in the region of 4m rows. – Roger Mar 17 '10 at 12:23
  • Really depends. You could try an opposite index (TIMESTAMP first, then index). That may be more selective. – TomTom Mar 17 '10 at 12:25
  • post the output of EXPLAIN on your query. – nos Mar 17 '10 at 14:01

3 Answers3

6

That tip is only suitable when you have two columns A and B and use queries like:

where 'a' between A and B

That's not:

where A between 'a' and 'b'

Using index on date(column) rather than column could speed it up a little bit.

Konrad Garus
  • 53,145
  • 43
  • 157
  • 230
  • The tip does mention that it is suitable for time, "Fast interval (of time or ip addresses) searching with spatial indexes"? Thanks for the other hint. – Roger Mar 17 '10 at 15:38
  • It is suitable for time. But it only is suitable if your table has intervals (i.e. two columns) and you look for a value there. Not when you compare single column to an interval. – Konrad Garus Mar 17 '10 at 16:29
  • Ahhh, I see. Sorry, I'd misunderstood the application. – Roger Mar 17 '10 at 16:31
1

Could you EXPLAIN the query for us? Then we know how the database executes your query. And what about the configuration? What are the settings for shared_buffers and work_mem? And when did you (or your system) the last vacuum and analyze? And last thing, what OS and pgSQL-version are you using?

You can create wonderfull indexes but without proper settings, the database can't use them very efficient.

Frank Heikens
  • 117,544
  • 24
  • 142
  • 135
0

Make sure the index is TableID+TableTimestamp, and you do a query like:

SELECT
    ....
    FROM YourTable
    WHERE TableID=..YourID.. 
        AND TableTimestamp>=..startrange.. 
        AND TableTimestamp<=..endrange..

if you apply functions to the table's TableTimestamp column in the WHERE, you will not be able to completely use the index.

if you are already doing all of this, then your hardware might not be up to the task.

if you are using version 8.2 or later, you should try:

WHERE (TableID, TableTimestamp) >= (..YourID.., ..startrange.. ) 
    and (TableID, TableTimestamp) <= (..YourID.., ..endrange..)
KM.
  • 101,727
  • 34
  • 178
  • 212