Cheerio extract a link without a closing tag

Question

I am making a crawler in cheerio and nodejs and I am trying to extract a without a closing tag. It looks like this:

<item>
   <link>http://www.example.com
   <description>...</description>
</item>

how would I extract that link? trying to extract the text of the link tag doesn't return anything

score 0 · Answer 1 · answered Apr 16 '20 at 07:45

0

You need to have some parser which will parse the input dirty HTML and sanitize it. You can feed DOMPurify with string full of dirty HTML and it will return a string with clean HTML

Example of clean HTML

Closing dd tag is not present which is added as shown These clean HTML can now be used to load in cheerio More on DOMPurify

answered Apr 16 '20 at 07:45

Pranay Usgaonkar

1
1

1

Hi Pranay. Please do not link out to an example answer. It would be much more helpful to include that HTML in your answer. – Mike Poole Apr 16 '20 at 07:54

Cheerio extract a link without a closing tag

1 Answers1