0

I'm using casperjs to scrape a site. I setup a function which stores a string into a variable named images (shown below) and it works great.

images = casper.getElementsAttribute('.search-product-image','src');

I then call that variable in fs so I can export it to a CSV, which also works fine.

casper.then(function() {
    var f = fs.open('e36v10.csv', 'w');
    f.write(imagessplit + String.fromCharCode(13));
    f.close();
});

The issue I just noticed is that not all products have images, so when the scraper hits a product without an image it passes by it obviously. I need it to at least alert me somehow (something as simple as filler text thats says, "no image here") when it passes by a product without an image because what I do is I copy that string (along with may other strings) and organize them into columns within the CSV and it messes up the order of everything without having some sort of filler text ("no image here"). Thanks


Edit

Below is the exact source from the website I am trying to pull from.

A product I can get the image from and my code works fine:

<div class="search-v4-product-image">
    <img alt="238692" class="search-product-image" src="http://d5otzd52uv6zz.cloudfront.net/group.jpg">
    <p class="image-overlay">Generic</p>
</div>

A product with no image and my scraper passes right by it without alerting me.

<div class="search-v4-product-image">&nbsp;</div>
Artjom B.
  • 61,146
  • 24
  • 125
  • 222
critic
  • 69
  • 1
  • 10
  • As I understand your question, when there is no image shown for a product, then there isn't even an `img` tag there, such that this product is not present in the selected `img` through `.search-product-image`. If yes, then you need to provide an example HTML for a product with and one without image. General advice is: select all products without image through XPath (because it will not be possible with CSS) and write them (your product info) to the file. – Artjom B. Jun 12 '14 at 19:16
  • I edited my post if you can better understand it now. – critic Jun 23 '14 at 14:05

2 Answers2

1

First I would do images = casper.getElementsInfo('.search-product-image') which will give you an array of elements matching .search-product-image. Then you can iterate over this array and extract the src attribute from each element with: var src = image.attributes.src

Now that you have the src attribute you can simply check wether it has a value or not. If it does not, then you could assign it to placeholder text.

berrberr
  • 1,914
  • 11
  • 16
0

You can write this functionality for the page context this way:

casper.then(function(){
    var imgList = this.evaluate(function(){
        var productImages = document.querySelectorAll("div.search-v4-product-image"),
            imageList = [];
        Array.prototype.forEach.call(productImages, function(div){
            if (div.children.length == 0) {
                imageList.push({empty: true});
            } else {
                var img = div.children[0]; // assumes that the image is the first child
                imageList.push({empty: false, src: img.src});
            }
        });
        return imageList;
    });
    var csv = "";
    imgList.forEach(function(img){
        if (img.empty) {
            csv += ";empty";
        } else {
            csv += img.src+";";
        }
    });
    fs.write('e36v10.csv', csv, 'w');
});

This iterates over all divs and pushes the src to an array. You can check the empty property for every element.

I suspect that the output would be more meaningful if you iterate over all product divs and check it this way. Because then you can also write the product name to the csv.

You could use CSS selectors but then you would need make the :nth-child selection much higher in the hierarchy (product div list). This is because :nth-child only works based on its parent and not over the whole tree.

Artjom B.
  • 61,146
  • 24
  • 125
  • 222