0

I have a very long HTML which i want to scrap using Cheerio.js. I would like to do it in a more efficient way than just load the entire HTML while i need to scrap only 1 specific tag out of it.

The tag is:

<a class="uniqueClass" .....>
    ... here there might be multiple other tags.
</a>

Please note that i do NOT need help with selecting that tag with Cheerio and do my thing with it. I only want a way to load it more efficiency instead of loading the entire large HTML.

  • Efficiency is very important.

Thanks for the help!

TBE
  • 1,002
  • 1
  • 11
  • 32

1 Answers1

0

There is no way since you know nothing about the requested resource until you have a response (which contains all the source code), therefore you won't be able to "select/scrape" just a part/tag/whatever from "nothing" unless you know/load everything.

Andrés Pérez-Albela H.
  • 4,003
  • 1
  • 18
  • 29
  • Here is a solution in pseudo code but i need help with implementation: 1. get the HTML 2. instead of loading the entire HTML String, find the Sub String that contains your tag 3. load the substring into Cheerio – TBE Nov 29 '15 at 08:37
  • You could load the entire page in Cheerio, get the part you want, create another cheerio instance with only the part you selected and delete the first instance. – Shanoor Nov 29 '15 at 08:41
  • Even if you want to split it, find something inside it, you'll need to grab the whole source code first. Then you can read line by line or do whatever you want to. – Andrés Pérez-Albela H. Nov 29 '15 at 08:41
  • This is pretty much what I'm after, after I'm grabbing the source code as a string, i want to simplify the string to contain just the tag i need. This will save me loading the entire LONG HTML into a cheerio.js object – TBE Nov 29 '15 at 11:53