0

I am making a crawler in cheerio and nodejs and I am trying to extract a without a closing tag. It looks like this:

<item>
   <link>http://www.example.com
   <description>...</description>
</item>

how would I extract that link? trying to extract the text of the link tag doesn't return anything

cj2415
  • 1
  • 1

1 Answers1

0

You need to have some parser which will parse the input dirty HTML and sanitize it. You can feed DOMPurify with string full of dirty HTML and it will return a string with clean HTML

Example of clean HTML

Closing dd tag is not present which is added as shown These clean HTML can now be used to load in cheerio More on DOMPurify

  • 1
    Hi Pranay. Please do not link out to an example answer. It would be much more helpful to include that HTML in your answer. – Mike Poole Apr 16 '20 at 07:54