6

I take part in some Django poroject and we use geo data (with GeoDjango). I have installed PostGis as it described on AWS docs.

We have a lot of some points (markers) on the map. And we need to cluster them.

I found one library anycluster. This library need the PostgreSQL extension named kmeans-postgresql be installed on the Postgre database.

But my database is located on Amazon RDS. And I can't connect to it by SSH in order to install an extension...

Anybody knows how can I install kmeans-postgresql extension on my Amazon RDS database?

Or maybe you can advise me other ways of clustering?

Anton
  • 420
  • 5
  • 15

2 Answers2

5

The K-Means It is a really complex calculation that is useful to data mining and cluster analysis ( you can see more about it in the wikipedia page https://en.wikipedia.org/wiki/K-means_clustering ). It have a big complexity when have to deal with many points. The K-means extension to postgresql http://pgxn.org/dist/kmeans/doc/kmeans.html it is written in C and compiled in the database machine. This brings a better performance compared to an procedure in plpgsql. Unfortunately as @estevao_lucas answered, this extension it is not enabled into Amazon RDS.

If you really need the k-means result, I translated this implementation of it, created by Joni Salonen in http://jonisalonen.com/2012/k-means-clustering-in-mysql/ and changed to plpgsql https://gist.github.com/thiagomata/a9737c3455d6248bef9f. This function uses temporary table. It is possible change it to use only arrays of Pins, if you wanna to.

But, if you only need to show some pins in a map, you will probably be happy with a really faster and simpler function that groups the results into an [x,y] matrix. I have created such function because the kmeans function was taking too much time to process my database (with a lot more than 400K elements). So this implementation is really faster, but does not have all the features you would expect from the K-means module. Besides that, this grid function https://gist.github.com/thiagomata/18ea14853998468c1a1d returns very good results, when the goal it is to show a big number of pins in a map. Example of Grid Result

culebrón
  • 34,265
  • 20
  • 72
  • 110
Thiago Mata
  • 2,825
  • 33
  • 32
  • Thank you! Your functions are very useful! I decided do not use k-means because I need dynamic solution. And I have to cluster my markers on the server side. My markers are often filtered by users. My solution based on Quadtree. Each marker has its quadcode (lat,lon are constant). So you can query it and cluster through PostGis and a small SQL request. – Anton May 26 '16 at 12:13
1

You can just install supported extensions on Amazon RDS and Kmeans isn't it.

ERROR: Extension "kmeans" is not supported by Amazon RDS DETAIL: Installing the extension "kmeans" failed, because it is not on the list of extensions supported by Amazon RDS. HINT: Amazon RDS allows users with rds_superuser role to install supported extensions. See: SHOW rds.extensions; alexandria_development=> SHOW rds.extensions

RDS extensions:

btree_gin, btree_gist, chkpass, citext, cube, dblink, dict_int, dict_xsyn, earthdistance, fuzzystrmatch, hstore, intagg, intarray, isn, ltree, pgcrypto, pgrowlocks, pg_prewarm, pg_stat_statements, pg_trgm, plcoffee, plls, plperl, plpgsql, pltcl, plv8, postgis, postgis_tiger_geocoder, postgis_topology, postgres_fdw, sslinfo, tablefunc, test_parser, tsearch2, unaccent, uuid-ossp

Estevão Lucas
  • 4,440
  • 34
  • 37