1

Here is what I am trying to accomplish. I am able to scrape a web page successfully and then extract the information that I need and I have already run this on a couple of websites where the pagination links are readily available in the href attribute. My question is how do navigate to the next page when the pagination variable is dynamic:

<ul>
    <li>
        <a class="clickPage" href="javascript:previousPage()">1</a>
    </li>
    <li>
        <a class="clickPage active" href="javascript:currentPage()">2</a>
    </li>
    <li>
        <a class="clickPage" href="javascript:nextPage()">Next Page</a>
    </li>

So far as code here is what I have working for other sites

var request = require('request'),       // simplified HTTP request client
    cheerio = require('cheerio'),       // lean implementation of core jQuery
    Xray = require('x-ray'),            // 
    x = Xray(),
    fs = require('fs');                 // file system i/o

/*
    TODO: Make this feature dynamic, to take in the URL of the page
    var pageUrl;
*/

var status = 'for sale';
var counter = 0;

x('http://www.example.com/results/1', '.results', [{
    id: 'div.grid@id',    // extracts the value from the attribute id
    title: 'div.info h2',
    category: 'span.category',
    price: 'p.price',
    count: counter+1,    // why doesnt this update? this never shows in the json
    status: status       // this value never shows up in the json
}])
  .paginate(whatShouldThisBe)
  .limit(800)
  .write('products.json');

Also the value of count and status never gets shown in the JSON file that's generated. Not sure what am I doing wrong here, but would definitely appreciate all help.

Thanks!

johnanish
  • 91
  • 1
  • 6

1 Answers1

0

Have you tried with .paginate('ul li:nth-child(3) a@href') ?

In this way you get the third <li> in the <ul>.

pietrovismara
  • 6,102
  • 5
  • 33
  • 45
  • Thanks for letting me know. I have tried this: `.paginate('ul li:nth-child a@href')` However I noticed you have omitted the attribute value ("@href"). Is there a reason that was done? Just a gentle reminder, the links are created on the fly (onClick). – johnanish Jan 30 '17 at 23:53