0

I want to write a crawler to crawl facebook user's post for the purpose to analysis in my research. and i google some referred method, then I use this code to login facebook

public class fbcrawler {
    public static void main(String[] args) throws Exception{
        String email ="";
        String pwd = ""; 
        //1. Login Facebook
        WebClient webClient = new WebClient(BrowserVersion.CHROME);
        HtmlPage page = webClient.getPage("http://www.facebook.com");
        HtmlTextInput emailInput = (HtmlTextInput)page.getElementById("email");
        emailInput.setValueAttribute(email);
        HtmlPasswordInput passInput = (HtmlPasswordInput)page.getElementById("pass");
        passInput.setValueAttribute(pwd);
        HtmlSubmitInput submitBtn =(HtmlSubmitInput)page.getElementById("loginbutton").getFirstChild();
        HtmlPage mainPage = submitBtn.click();
        String pageAsXml = mainPage.asXml();
        System.out.println(pageAsXml);

use this code can login facebook,but if i dont page down the web , I can only crawl the first few data on the web page, please help me to use htmlunit to page down in facebook

1 Answers1

0

Trying to wait for the background JavaScript, it gets more data:

HtmlSubmitInput submitBtn = (HtmlSubmitInput) page.getElementById("loginbutton").getFirstChild();
HtmlPage mainPage = submitBtn.click();
webClient.waitForBackgroundJavaScript(10_000);
String pageAsXml = mainPage.asXml();

Other ways to wait for AJAX are posted here

Ahmed Ashour
  • 5,179
  • 10
  • 35
  • 56