crawler4j - I can't get the title

Question

In short: I can’t get this URL’s title http://www.namlihipermarketleri.com.tr/default.asp?git=9&urun=10277 (which is broken now (18-11-2015) )

İn my WebCrawler implementation:

     @Override
     public void visit(Page page) {          
         System.out.println(page.getWebURL().getURL()); // when this prints the url
         if (page.getParseData() instanceof HtmlParseData) {
             HtmlParseData htmlParseData = (HtmlParseData) page.getParseData();
             System.out.println(htmlParseData.getTitle()); // This line prints an empty line!
         }
     }

Note: Title itself contains some commas “,”. Can you suggest a solution? Is this a bug?

Thanks in advance.

score 2 · Accepted Answer · answered Jul 09 '15 at 09:31

The problem was probably there were 4 title tags in the HTML document.

I've used Jsoup: http://jsoup.org/

HtmlParseData htmlParseData = (HtmlParseData) page
                        .getParseData();
String html = htmlParseData.getHtml();
Document htmlDocument = Jsoup.parse(html);              
String title = htmlDocument.getElementsByTag("title").get(0).text();

crawler4j - I can't get the title

1 Answers1