2

I'm thinking if it could be possible to get a site's "characteristic" color. For instance, TechCrunch would be green, ReadWriteWeb would be red, CNN also red, Microsoft blueish, PHP purple, etc...

It doesn't have to be accurate, just a best guess.

Some things I have on my mind:

  • parse all css rules and find the one matching the most elements
  • parse all css rules and find background colors of the elements having the biggest dimensions
  • getting the body element's background image and getting the predominant color of that (is this possible for an image)
  • somehow finding the site's "header" (first element in DOM with background css attribute set?) and getting its background

Also I would need a way to eliminate blacks, greys and white.

Is this feasible? Do you have any other ideas?

P.S. Sorry for my English

4 Answers4

5

Feasible, definitely. You can use the wget tool and some simple regular expressions to parse out CSS colors. You can then collect all those colors and see which one is used most. That will however not always be a good representation of the actual predominant color in a website as it could be possible that several colors occur in many CSS rules but aren't used often.

This is actually a nontrivial project you have here.

My approach would be as follows:

  • Download and parse out CSS colors and look for the total count of different colors. If there are only few colors at all, you're more likely to have found the predominant color. It's often the color used for <a> tags or <h1> tags (but not if they're grey or black/white).
  • When parsing, you should "pool" the colors so that, e.g. #FFEEEE is the same as #FFEAEA, as they're only marginally different.
  • You need to bring different CSS colors into the same format, e.g. #FFF, #FFFFF, "white", rgb(255,255,255), and so on.
  • You need a ruleset for this and a good knowledge of programming
  • Finding the predominant colors in images is not so trivial anymore. The simplest approach is for each R, G and B components of every pixel to determine which is the predominant. If your pixel has the values R(120), G(240), B(80), it will most likely be green. Then count this for all the pixels and find the predominant component.
  • @mu is too short suggested to convert the values into HSV and only extract the hue.
  • Another advanced method would include creating a histogram of the three color components and then calculating the area under the histogram.

To sum it up, the task you're defining is worth a thesis, in my opinion :)

Community
  • 1
  • 1
slhck
  • 36,575
  • 28
  • 148
  • 201
  • Thanks for your answer, I know it's REALLY REALLY nontrivial, just looking for ideas. I was thinking with going with the a's route myself, never thought about h1. Good idea! – Adrian Grigore Jan 12 '11 at 15:20
  • If I had more time I'd really work on this problem too, and build some scripts and tools. It's a great question and definitely worth the effort. At least for me :) I hope you have found some starting points. – slhck Jan 12 '11 at 15:24
  • 1
    You might have more success working in HSV rather than RGB, then you can simply ignore the saturation and value components and histogram the hues. – mu is too short Jan 12 '11 at 17:03
  • That's absolutely right. It's just that I've already done this in Java and you get the RGB components faster than HSV. There's always room for improvement in such algorithms. – slhck Jan 12 '11 at 20:41
2

Ok, here comes some seriously unorthodox approach:

Use some screen capturing package[1][2] to render the given URL to a Raster Image (like PNG). Analyse the resulting raster image sampling it's pixels for an average, if you're looking for average, or give a threshold to group pixels into "colour-groups". Using the average or max-occurrence of colour groups (which method to use depends on what matters most to you) you can get a pretty high accuracy representation of the predominant colour in the page.

[1] http://cutycapt.sourceforge.net/ [2] http://weblogs.mozillazine.org/roc/archives/2005/05/rendering_web_p.html

1

Using Node.js, Phantomjs and Color-Thief

Dependencies: Node-canvas (which in turn depends on Cairo), Webshot (which depends on Phantomjs), Color-thief, minor dependencies listed on individual package pages.

Webshot is a light wrapper around the headless Webkit Phantomjs.
You can use it to take a screenshot of your page and store it in a stream, sample code below from the project Github.

var webshot = require('webshot');
var fs      = require('fs');

webshot('google.com', function(err, renderStream) {
  var file = fs.createWriteStream('google.png', {encoding: 'binary'});

  renderStream.on('data', function(data) {
    file.write(data.toString('binary'), 'binary');
  });
});

You can then proceed to pass the image onto Color-thief which will extract the required data for you, see the project samples page for examples.

Community
  • 1
  • 1
Etheryte
  • 24,589
  • 11
  • 71
  • 116
1

What about taking a screenshot and extracting the predominant colors in that image with something like the GD lib?

Antonio Lopes
  • 512
  • 2
  • 8
  • 21