1

Basically I am facing a problem where crawler4j do not recognize all links on the page.

say for example there are 5 links existing on the page out of them only 3 gets recognized and hence fetched. Rest 2 are not even recognized.

What is the expected output? What do you see instead? All the links in a page shall be recognized so that they can be fetched

What version of the product are you using? crawler4j 4.1

Please provide any additional information below. Only difference I found in the links which are not recognized is that these links has angled bracket in it.

ex.

<a title="some text" href="http://www.example.com/abc/xyz-<sometext>-abc-xyz/abc_xyz" >some text</a>

1 Answers1

0

Yes, it seems like a bug in the crawler4j page parser.

It finds the tag, then it searches for a closing bracket - here is the failure point I assume.

Please submit an issue to the new crawler4j site - on github: https://github.com/yasserg/crawler4j/issues

Thanks

Chaiavi
  • 769
  • 9
  • 23