0

I'm a beginner with js and nodejs. I'm writing a simple program, which loads and parses array of urls with help of request and cheerio modules. It takes me really long time to parse array, which contains about 700 urls, about 40 seconds. So, I'm searching for something, which can help me to increase the speed of execution. I tried Parallel module, using map, but that leads my request function to being undefined for some unknown reason... I've also heard about async module, but as I know js still remains single-threaded with it. Could someone give me advice what to use to reduce the time of execution, please?

Here is my try:

var p = new Parallel(Arr);
p.map(GetGenres);`

var GetGenres =  function(value){

        request(value.url, function(err, res, body){

            var $ = cheerio.load(body, { xmlMode: true });
            value['genres'] = [];
            $('div').has('span:contains("Genres:")')
                .children('a')
                .each(function(i, element){
                    value.genres.push($(this).text());
                })

            });
        };
detaylor
  • 7,112
  • 1
  • 27
  • 46
K.Rice
  • 599
  • 1
  • 8
  • 27
  • 2
    40 seconds to request and parse 700 pages? While the .has() selector seems complicated, I'm guessing the time is mostly spent downloading 700 pages. – Sam R May 15 '16 at 12:39
  • 1
    Just loading without doing any searching on page takes me about 14 seconds. – K.Rice May 15 '16 at 12:51
  • 1
    unfortunately, regex (that you should never parse HTML) will be faster than cheerio for this case (of course without including network latency) xD – YOU May 15 '16 at 13:01

0 Answers0