Home » Php » php – Cluster points in PostGIS

php – Cluster points in PostGIS

Posted by: admin July 12, 2020 Leave a comment


I’m building an application that pulls lat/long values from a database and plots them on a Google Map. There could be thousands of data points so I “cluster” points close to each other so the user is not overwhelmed with icons. At the moment I perform this clustering in the application, with a simple algorithm like this:

  1. Get array of all points
  2. Pop first point off array
  3. Compare first point to all other points in array looking for ones that fall within x distance
  4. Create a cluster with the original and close points.
  5. Remove close points from array
  6. Repeat

Now I release this is inefficient and is the reason I have been looking into GIS systems. I have set up PostGIS and have my lat & longs stored in a POINT geometry object.

Can someone get me started or point me to some resources on a simple implementation of this clustering algorithm in PostGIS?

How to&Answers:

I ended up using a combination of snaptogrid and avg. I realize there are algorithms out there (i.e. kmeans as Denis suggested) that will give me better clusters but for what I’m doing this is fast and accurate enough.


If it’s enough to have stuff clustered in your browser, you could easily make use of OpenLayer’s clustering capabilities. There are 3 examples that show clustering.

I’ve used it with a PostGIS database before, and as long as you don’t have ridiculous amounts of data, it works pretty smooth.


An example of clustering lonlat points (of st_point type) with PostGIS. The result set will contain (cluster_id, id) pairs. The number of clusters is the argument passed to ST_ClusterKMeans.

WITH sparse_places AS (
    lonlat, id, COUNT(*) OVER() as count
  FROM places
    ST_ClusterKMeans(lonlat::geometry, LEAST(count::integer, 10)) OVER() AS cid
  FROM sparse_places;

We need the Common Table Expression with a COUNT window function in order to make sure the number of clusters provided to ST_ClusterKMeans never goes below the number of input rows.

I wrote a bit longer description on how I do clustering in Postgis here.