I’m building an application that pulls lat/long values from a database and plots them on a Google Map. There could be thousands of data points so I “cluster” points close to each other so the user is not overwhelmed with icons. At the moment I perform this clustering in the application, with a simple algorithm like this:
- Get array of all points
- Pop first point off array
- Compare first point to all other points in array looking for ones that fall within x distance
- Create a cluster with the original and close points.
- Remove close points from array
Now I release this is inefficient and is the reason I have been looking into GIS systems. I have set up PostGIS and have my lat & longs stored in a POINT geometry object.
Can someone get me started or point me to some resources on a simple implementation of this clustering algorithm in PostGIS?
I ended up using a combination of snaptogrid and avg. I realize there are algorithms out there (i.e. kmeans as Denis suggested) that will give me better clusters but for what I’m doing this is fast and accurate enough.
If it’s enough to have stuff clustered in your browser, you could easily make use of OpenLayer’s clustering capabilities. There are 3 examples that show clustering.
I’ve used it with a PostGIS database before, and as long as you don’t have ridiculous amounts of data, it works pretty smooth.
An example of clustering
lonlat points (of
st_point type) with PostGIS. The result set will contain (cluster_id, id) pairs. The number of clusters is the argument passed to
WITH sparse_places AS ( SELECT lonlat, id, COUNT(*) OVER() as count FROM places ) SELECT sparse_places.id, ST_ClusterKMeans(lonlat::geometry, LEAST(count::integer, 10)) OVER() AS cid FROM sparse_places;
We need the Common Table Expression with a
COUNT window function in order to make sure the number of clusters provided to
ST_ClusterKMeans never goes below the number of input rows.
I wrote a bit longer description on how I do clustering in Postgis here.