How to Get Crawl content in Crawljax

Question

I have crawl Dynamic webpage using Crawljax. i can able to get crawl current id, status and dom. but i can't get the Website content.. Any one help me??

CrawljaxConfigurationBuilder builder =
            CrawljaxConfiguration.builderFor("http://demo.crawljax.com/");
    builder.addPlugin(new OnNewStatePlugin() {



        @Override
        public String toString() {
            return "Our example plugin";
        }

                @Override
                public void onNewState(CrawlerContext cc, StateVertex sv) {

                    LOG.info("Found a new dom! Here it is:\n{}", cc.getBrowser().getStrippedDom());
                       String name = cc.getCurrentState().getName();
String url = cc.getBrowser().getCurrentUrl();
System.out.println(cc.getCurrentState().getDom());
System.out.println("New State: " + name + "; url: " + url);
                }
    });
    CrawljaxRunner crawljax = new CrawljaxRunner(builder.build());
    crawljax.call();

How to get dynamic/java script Webpage content..

[Related question by same user](http://stackoverflow.com/questions/28085211/any-possibility-to-crawl-open-web-browser-data-using-aperture) — GolezTrol, Jan 22 '15 at 12:29
If you can get the DOM, you can get the content, right? It's the same thing. — GolezTrol, Jan 22 '15 at 12:30

BasK · Accepted Answer · 2015-02-03T11:02:39.097

1

We can able to get website source code cc.getBrowser().getStrippedDom()); or cc.getCurrentState().getDocument(); This coding are Return Source code (css/java script file)..

Not possible.Because its testing tool.This tool only check Text are available, assign temp data to Fields.

edited Feb 03 '15 at 11:02

answered Feb 03 '15 at 10:02

BasK

284
8
24

score -1 · Answer 2 · edited May 23 '17 at 11:45

-1

To get the website content, use the following function:

cc.getCurrentState().getDom()

This function does not return a DOM node, but actually returns the page's HTML text instead. This is the right function to use if you want the page content, but it sounds like it returns a DOM node, so the name getDom is a misnomer. To get a DOM node instead, use:

cc.getCurrentState().getDocument()

which returns the Document DOM node.

~~You can retrieve the page content with:~~

~~cc.getCurrentState().getDocument().getTextContent()~~

(EDIT: This won't work -- getTextContent always returns null when called on Documents.)

edited May 23 '17 at 11:45

Community

1
1

answered Jan 14 '15 at 06:43

Quip Yowert

444
2
7

Can you check it.. it's displayed Null Value – BasK Jan 14 '15 at 06:51
then.How can get text value? – BasK Jan 23 '15 at 12:07

How to Get Crawl content in Crawljax

2 Answers2

Linked