0

I have been struggling with marker-clustering problem with 1000+ markers (that should be put on a Google map). I am not very keen on rendering large JSON structures with all the markers, neither I am fond of some complex server "geo"-computations with PostGIS.

The solution I came up with is to divide world map into some sort of hierarchical spatial tree, let's say quad tree, where each point in my db will be assigned with "coordinates" in that tree. These coordinates are strings that have on position_x index_of_tile in tier_x, e.g. '031232320012'. The length of the string depends on number of zoom levels that will be enabled for the front-end map. Basically if a user moves or zooms the map, I'll launch Ajax GET request with the current zoom level and view port coordinates as parameters. Then in back-end I plan to build a string that should point to the "viewport at the given zoom level", e.g. '02113' and I want to find all points that have this prefix ('02113') in the tree coordinates column.

EDIT: I will also need fast GROUP BY, e.g. SELECT count(*) from points GROUP BY left(coordinates, 5);

My question is how to perform these operations as fast as possible? My database is PostgreSQL.

clime
  • 8,695
  • 10
  • 61
  • 82
  • I'd take another look at whether you are better off with PostgreSQL's geo support. You don't have to go the whole hog with PostGIS. You can index on bounding boxes and even nearest-neighbour tests in recent versions. Failing that, Mike's answer below looks good. – Richard Huxton Apr 23 '13 at 18:17

1 Answers1

2

Then in back-end I plan to build a string that should point to the "viewport at the given zoom level", e.g. '02113' and I want to find all points that have this prefix ('02113') in the tree coordinates column.

An ordinary index should perform well on any modern dbms as long as you're looking at the left-most five (or six or seven) characters of a string in an indexed column.

SELECT ...
...
WHERE column_name LIKE '02113%';

In PostgreSQL, you can also build an index on an expression. So you could create an index on the first five characters.

CREATE INDEX your_index_name ON your_table (left(column_name, 5));

I'd expect PostgreSQL's query optimizer to pick the right index if there were three or four like that. (One for 5 characters, one for 6 characters, etc.)


I build a table, and I populated it with a million rows of random data.

In the following query, PostgreSQL's query optimizer did pick the right index.

explain analyze
select s
from coords
where left(s, 5) ='12345';

It returned in 0.1 ms.

I also tested using GROUP BY. Again, PostgreSQL's query optimizer picked the right index.

"GroupAggregate  (cost=0.00..62783.15 rows=899423 width=8) (actual time=91.300..3096.788 rows=90 loops=1)"
"  ->  Index Scan using coords_left_idx1 on coords  (cost=0.00..46540.36 rows=1000000 width=8) (actual time=0.051..2915.265 rows=1000000 loops=1)"
"Total runtime: 3096.914 ms"

An expression like left(name, 2) in the GROUP BY clause will require PostgreSQL to touch every row in the index, if not every row in the table. That's why my query took 3096ms; it had to touch a million rows in the index. But you can see from the EXPLAIN plan that it used the index.

Ordinarily, I'd expect a geographic application to use a bounding box against a PostGIS table to reduce the number of rows you access. If your quad tree implementation can't do better than that, I'd stick with PostGIS long enough to become an expert with it. (You won't know for sure that it can't do the job until you've spent some time in it.)

Mike Sherrill 'Cat Recall'
  • 91,602
  • 17
  • 122
  • 185
  • The problem with that index on expression is that here will be probably more than 10 indexes then. I'll try to use that ordinary index. – clime Apr 23 '13 at 17:10
  • I've just realized I also need fast group by, e.g.`SELECT count(*) from points GROUP BY left(coords, 5)`. Is there any way to make group by faster? – clime Apr 23 '13 at 17:55
  • The number of indexes is *probably* not a problem, but I haven't built anything quite like you're talking about. (It's easy enough to test, though.) You can make GROUP BY faster by creating an index on `left(coords, 5)`. Indexes and sargable expressions (WHERE clauses that can actually *use* an index) are your main sources of speed. `EXPLAIN ANALYZE SELECT...` is your friend. – Mike Sherrill 'Cat Recall' Apr 23 '13 at 18:46
  • Are you sure that indexes on expressions help for GROUP BY? I have this index: `"test" btree ("left"(name::text, 2))` on one of my test tables but query plan for `select count(*) from web_city GROUP BY left(name,2);` is exactly the same as without it. – clime Apr 23 '13 at 18:51
  • The same goes for ORDER BY. Although WHERE is influenced by the index if I do `WHERE left(name, 2) = 'ab';`. – clime Apr 23 '13 at 18:58
  • @clime: Yes. I expanded my answer. The expression `GROUP BY left(name, 2)` forces PostgreSQL to touch every entry in the index. (A million entries, in my case.) – Mike Sherrill 'Cat Recall' Apr 23 '13 at 19:10
  • Lucky you, mine just refuse to pick the index. It picks it with WHERE but not with GROUP BY. What is your postgresql version please? Mine is 9.2. – clime Apr 23 '13 at 19:21
  • I made that a separate question, If you would be that kind to look: http://stackoverflow.com/questions/16177825/why-postresql-9-2-does-not-use-index-scan-if-it-can – clime Apr 23 '13 at 19:42
  • I'm using 9.1 here. If you don't have a *lot* of rows, the optimizer might pick a table scan anyway. – Mike Sherrill 'Cat Recall' Apr 23 '13 at 19:56
  • Ha, with your help, I figured it out. In fact I'll need just like two or three indexes on that column for the lowest zoom levels. After that, lots of rows will get filtered by viewport filtering! I am glad I can avoid this geographical stuff with Postgis and Geodjango (ye, I am using Django) because this app in not really geographical. It just happens to keep records of some points on the world but its main purpose is more like social network. – clime Apr 23 '13 at 22:35