Selenium takes lots of time to get dynamic page of given URL

Question

I am doing a Project in Java. In this project I have to work with DOM. For that I first load a dynamic page of any given URL, by using Selenium. Then I parse them using Jsoup.

I want to get the dynamic page source code of given URL

Code snapshot:

public static void main(String[] args) throws IOException {

     // Selenium
     WebDriver driver = new FirefoxDriver();
     driver.get("ANY URL HERE");  
     String html_content = driver.getPageSource();
     driver.close();

     // Jsoup makes DOM here by parsing HTML content
     Document doc = Jsoup.parse(html_content);

     // OPERATIONS USING DOM TREE
}

But the problem is, Selenium takes around 95% of the whole processing time, that is undesirable.

Selenium first opens Firefox, then loads the given page, then gets the dynamic page source code.

Can you tell me how I can reduce the time taken by Selenium, by replacing this tool with another efficient tool. Any other advice would also be welcome.

Edit NO. 1

There is some code given on this link.

FirefoxProfile profile = new FirefoxProfile();
profile.setPreference("general.useragent.override", "some UA string");
WebDriver driver = new FirefoxDriver(profile);

But what is second line here, I didn't understand. As Documentation is also very poor of selenium.

Edit No. 2

System.out.println("Fetching %s..." + url1); System.out.println("Fetching %s..." + url2);

    WebDriver driver = new FirefoxDriver(createFirefoxProfile());

    driver.get("url1");  
    String hml1 = driver.getPageSource();

    driver.get("url2");
    String hml2 = driver.getPageSource();
    driver.close();

    Document doc1 = Jsoup.parse(hml1);
    Document doc2 = Jsoup.parse(hml2);

@KDM Can you elaborate this, as I am naive in this field. Please explain this. — devsda, Apr 05 '13 at 09:45
When you create a webdriver using `new FirefoxDriver()` - selenium creates a new profile for firefox and creates it. That itself is a costly operation. You can send a FirefoxProfile object to the constructor, which avoids the creating a new profile each and every time. I will try to put together some code. — Dakshinamurthy Karra, Apr 05 '13 at 09:47
@KDM I added Edit no. 1 , please see that. That shows some code. But that is not in my range. So please explain this how can I ddo this. — devsda, Apr 05 '13 at 10:03

score 1 · Accepted Answer · answered Apr 05 '13 at 10:13

1

Try this:

public static void main(String[] args) throws IOException {

    // Selenium
    WebDriver driver = new FirefoxDriver(createFirefoxProfile());
    driver.get("ANY URL HERE");
    String html_content = driver.getPageSource();
    driver.close();

    // Jsoup makes DOM here by parsing HTML content
    // OPERATIONS USING DOM TREE
}

private static FirefoxProfile createFirefoxProfile() {
    File profileDir = new File("/tmp/firefox-profile-dir");
    if (profileDir.exists())
        return new FirefoxProfile(profileDir);
    FirefoxProfile firefoxProfile = new FirefoxProfile();
    File dir = firefoxProfile.layoutOnDisk();
    try {
        profileDir.mkdirs();
        FileUtils.copyDirectory(dir, profileDir);
    } catch (IOException e) {
        e.printStackTrace();
    }
    return firefoxProfile;
}

The createFireFoxProfile() method creates a profile if one doesn't exist. It uses if a profile already exists. So selenium doesn't need to create the profile-dir structure each and every time.

answered Apr 05 '13 at 10:13

Dakshinamurthy Karra

5,353
1
17
28

THanks, wait I will put this module and check how it effects my project. – devsda Apr 05 '13 at 10:17
`FileUtils.copyDirectory(dir, profileDir);`. Netbeans says create class FileUtils. I think there is some mistake. Please see. – devsda Apr 05 '13 at 10:21
It is from apache commons. Selenium also uses it - so add the jar to your project. – Dakshinamurthy Karra Apr 05 '13 at 10:24
I added this dependency ` org.apache.commons commons-io 1.3.2 `. But it throws the same error. – devsda Apr 05 '13 at 10:30
I am on eclipse. The JAR I added is commons-io-2.2.jar. – Dakshinamurthy Karra Apr 05 '13 at 10:32
Yes, it worked on Netbeans also. But in maven it shows error. – devsda Apr 05 '13 at 10:33
Can you tell me what this function is helpful in my case? – devsda Apr 05 '13 at 10:35
Time it and see whether it helps. When you create a firefox driver - selenium creates a new profile for the firefox instance. Each time when you create a webdriver, the profile is recreated. This function avoids creating a firefox profile each and every time. – Dakshinamurthy Karra Apr 05 '13 at 10:37
Thanks a lot. I am trying to run this on maven. If I find any problem, I will inform you. Thanks again. – devsda Apr 05 '13 at 10:39
Now it works on maven also. In my algorithm I have to open two URLS. It performs fine, I first get first URL, then get another URLS. It becomes faster than previous one. Can it become more efficient ? See my code in Edit No. 2. – devsda Apr 05 '13 at 11:13
I perform all the instructions of yours. I observe that it takes around `13 - 18 seconds` to get two URL Page source code(dynamic code). But I want this task to perform in around 1 - 2 seconds or any efficient time. How can I get dynamic pages of two URLS in efficient time? Please see my main problem, and this question is the part to make that algorithm efficient. http://stackoverflow.com/questions/15718235/optimized-algorithm-to-compare-templates-of-two-urls – devsda Apr 05 '13 at 12:30
If you have few time, then can we discuss my problem via chat, please. – devsda Apr 05 '13 at 12:30
let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/27644/discussion-between-kdm-and-jhamb) – Dakshinamurthy Karra Apr 05 '13 at 14:27
Can we discuss on chat, please. I need your expert advice. Please give some time. – devsda Apr 06 '13 at 07:46
I want to apply HtmlUnit for getting dynamic page source code, because Selenium takes around 13 - 18 seconds to get two URLS page source code. I tried to write code using HtmlUnit, but it is not working. Do you know any good tutorial for the same or help me in writing code, please Give some guidence. – devsda Apr 06 '13 at 10:43
I followed your instructions, and perform the same thing using GhostDriver + PhantomJs, but there is not much difference in the time occurs. What can I do now. Here is my code, that have one error, please see the code. – http://stackoverflow.com/questions/15852687/why-code-not-exits-at-the-end-after-closing-the-driver-ghostdriver-phantomjs – devsda Apr 06 '13 at 19:45
I am ready with my code, and it ran fine for two urls, but when I test the same for the numbers of urls, it fails, Can you please help me ? The question link is http://stackoverflow.com/questions/16075837/shows-exception-in-java-code-selenium-jsoup – devsda Apr 18 '13 at 07:31

score 0 · Answer 2 · answered Feb 23 '15 at 14:15

0

if you are sure, confident about your code, you can go with phantomjs. it is a headless browser and will get your results with quick hits. FF will take time to execute.

answered Feb 23 '15 at 14:15

divine

4,746
3
27
38

1

this late answer may be of short usage. One of the comment of devnull on 6 april 2013 was: "I followed your instructions, and perform the same thing using GhostDriver + PhantomJs, but there is not much difference in the time occurs" – aberna Feb 23 '15 at 14:18

Selenium takes lots of time to get dynamic page of given URL

2 Answers2

Linked