Home » Php » php – Cluster points in PostGIS

php – Cluster points in PostGIS

Posted by: admin July 12, 2020 Leave a comment

Questions:

I’m building an application that pulls lat/long values from a database and plots them on a Google Map. There could be thousands of data points so I “cluster” points close to each other so the user is not overwhelmed with icons. At the moment I perform this clustering in the application, with a simple algorithm like this:

  1. Get array of all points
  2. Pop first point off array
  3. Compare first point to all other points in array looking for ones that fall within x distance
  4. Create a cluster with the original and close points.
  5. Remove close points from array
  6. Repeat

Now I release this is inefficient and is the reason I have been looking into GIS systems. I have set up PostGIS and have my lat & longs stored in a POINT geometry object.

Can someone get me started or point me to some resources on a simple implementation of this clustering algorithm in PostGIS?

How to&Answers:

I ended up using a combination of snaptogrid and avg. I realize there are algorithms out there (i.e. kmeans as Denis suggested) that will give me better clusters but for what I’m doing this is fast and accurate enough.

Answer:

If it’s enough to have stuff clustered in your browser, you could easily make use of OpenLayer’s clustering capabilities. There are 3 examples that show clustering.

I’ve used it with a PostGIS database before, and as long as you don’t have ridiculous amounts of data, it works pretty smooth.

Answer:

An example of clustering lonlat points (of st_point type) with PostGIS. The result set will contain (cluster_id, id) pairs. The number of clusters is the argument passed to ST_ClusterKMeans.

WITH sparse_places AS (
  SELECT
    lonlat, id, COUNT(*) OVER() as count
  FROM places
) 
  SELECT
    sparse_places.id,
    ST_ClusterKMeans(lonlat::geometry, LEAST(count::integer, 10)) OVER() AS cid
  FROM sparse_places;

We need the Common Table Expression with a COUNT window function in order to make sure the number of clusters provided to ST_ClusterKMeans never goes below the number of input rows.

I wrote a bit longer description on how I do clustering in Postgis here.