I have a pretty big number of photos and a RGB color map (let’s say of about 100 colors). How can I group the pictures by color and obtain something like the following: http://labs.ideeinc.com/multicolr ?
My current idea is this: Using ImageMagick, do this for each photo:
- Resize it to a smaller size so that it can be processed faster.
- Quantize it without dithering using my chosen color map.
- Get the photo’s histogram to obtain how many times each color appears.
- Store the colors in a database, but I haven’t figured out what is the best way to do this for fast retrievals.
Do you know any better and more efficient way to do this? My language of choice is PHP since all the heavy processing will be done by ImageMagick, and the database is PostgreSQL.
Thank you in advance!
I notice you already figured out how to get the most relevant colors from the image. Don’t resize the images so much because the histogram may look different.
The database may look something like that:
image_id | image_file
color_id | color_rgb
image_id | color_id | color_percent
color_percent column will be used for grouping / where clauses
select image_id sum(color_percent)/count(color_percent) as relevance from image_color where color_id IN (175, 243) # the colors you want to involve in this search and color_percent > 10 # this will drop results with lower significance group by image_id order by relevance
Colours are essentially three dimensional vectors (regardless if they are represented as HSV, RGB, CMY[K]). Unfortunately relational database mostly aren’t very good at working in more than 1 dimension.
If you reduce the image down to a single “average” colour then the solution becomes a bit simpler:
A trivial analysis would imply that you would need to compare a new image with every existing image to determine the level of similarity. However a better approach would be to digitise the vector the find similar values in the database.
e.g. for 24-bit colour 124, 39, 201
as 1 bit colour: 0,0,1
as 2 bit colour: 1,0,2
If you want to look at more colours in the image, then I’d recommend reducing down to the nearest values of a fixed colour map without error-propagation and identifying the top ‘N’ most frequently used colours. What you do after that would require some trial and effort – the method above weighted for frequency in the interim image might be necessary or you might just get away with looking at the images where the top N-M colours match N-X of your calculated values (with some tweaking of the M and X values).
I have noticed what you need to do. The best way to go around that problem is converting your image from RGB to CIE-LAB colour profile.
Then you can calculate the distance between the two colours in 3d space.