How to retrieve images (decoded if possible) present in a wepage using XPCOM

Question

How to get all the images, after decoding if possible, on a webpage through XPCOM ?

The image might be specified in HTML as a background url in some CSS property, inside img tag, or in any form that a web developer might have included.

I tried looking into imgIContainer, imgIDecodeObserver and many other interfaces. Although there is a way through which we can provide image URI to Mozilla so that it loads the image, decodes it and returns imgIContainer. But I couldn't find anyway to get all images in current webpage.

This has to be done in either Java or Javascript.

Any suggestions?

@Wladimir - Thanks for your help.

I want all the images including CSS constructs (background images). So now I am listening to events from nsIWebProgressListener.


    onStateChange: function(webProgress, request, stateFlags, status) {
        if ((~stateFlags & (nsIWebProgressListener.STATE_IS_REQUEST | nsIWebProgressListener.STATE_STOP)) == 0) {
            var imgReq = request.QueryInterface(CI.imgIRequest);
            if (imgReq)
                var img = imgReq.image;
        }
    }

The problem is that request.QueryInterface(CI.imgIRequest) throws exception for all NON-image requests. Although those exceptions can be ignored by putting code inside try-catch block, but I'd prefer to do things cleanly.

Is there any condition that can be checked to know whether request is for image or not?

Wladimir Palant · Accepted Answer · 2012-05-15T09:16:59.610

0

There is existing code that you can look at. The Page Info dialog has a Media tab that successfully shows most images on the page. The important function is grabAll() in pageInfo.js, it is called for each element (via a TreeWalker). As you can see, there is no generic way to get the image, this function rather uses window.getComputedStyle() to extract the values of a bunch of the CSS properties for this element: background-image, border-image, list-style-image, cursor. It will also look for <img>, <svg:image>, <link> (favicon), <input>, <button>, <object> and <embed> tags. It doesn't manage to recognize everything however, e.g. these CSS constructs will not be recognized:

.foo:before
{
  content: url(image.png);
}
.foo:hover
{
  background-image: url(image.png);
}

Still, this is probably as far as you can get - unless you want to look at the requests made by the web page as it loads.

Edit: If you look at the requests as they are performed (via a web progress listener), you can do the following:

if (request instanceof CI.imgIRequest)
  var img = request.URI.spec;

Note that request.image won't help you much, almost all methods of imgIContainer are only accessible from native code.

edited May 15 '12 at 09:16

answered May 09 '12 at 19:39

Wladimir Palant

56,865
12
98
126

As I said: "unless you want to look at the requests made by the web page as it loads". However, your question sounded like you wanted to extract images from a page that already loaded. – Wladimir Palant May 15 '12 at 06:29
Yes, thats what I wanted. For some reason, I am not able to put my code here with proper formatting. Let me try few more tricks to put my code here. – Ankur May 15 '12 at 06:31
@Ankur: Please add your own answer and accept it. The "help" link explains how to format code. – Wladimir Palant May 15 '12 at 06:33
thanks again for your help. I'll try this and let you know the result. Thanks a lot !! – Ankur May 15 '12 at 11:23
Thanks a alot. I was able to get imgIContainer objects successfully by observing imgIRequest. – Ankur May 28 '12 at 08:44

How to retrieve images (decoded if possible) present in a wepage using XPCOM

1 Answers1