0

I am developing a grails app using crawler4j.

I know this is an old question and I came across this solution here.

I tried the solution provided but am not sure where to keep the another fetcher and mockssl java files.

Also, I am not sure how these two classes would be called in case of urls containing https://...

Thanks in advance.

clever_bassi
  • 2,392
  • 2
  • 24
  • 43

1 Answers1

0

The solutions works fine. Maybe you have some problems to deduce where to put the code. Here is how I use it:

When creating the crawler, you will have something like this in your main class as showed in official documentation:

public class Controller {
public static void main(String[] args) throws Exception {
    CrawlConfig config = new CrawlConfig();
    config.setCrawlStorageFolder(crawlStorageFolder);

    /*
     * Instantiate the controller for this crawl.
     */
     PageFetcher pageFetcher = new MockSSLSocketFactory(config);
     RobotstxtConfig robotstxtConfig = new RobotstxtConfig();
     RobotstxtServer robotstxtServer = new RobotstxtServer(robotstxtConfig, pageFetcher);
     CrawlController controller = new CrawlController(config, pageFetcher, robotstxtServer);
    ....

Here you use the MockSSLSocketFactory that is defined as showed in the link you have posted:

public class MockSSLSocketFactory extends PageFetcher {

public MockSSLSocketFactory (CrawlConfig config) {
    super(config);

    if (config.isIncludeHttpsPages()) {
        try {
            httpClient.getConnectionManager().getSchemeRegistry().unregister("https");
            httpClient.getConnectionManager().getSchemeRegistry()
                    .register(new Scheme("https", 443, new SimpleSSLSocketFactory()));
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}
}

As you can see, here is using the class SimpleSSLSocketFactory. That can be defined as is shown in the example of the link:

public class SimpleSSLSocketFactory extends SSLSocketFactory {

public SimpleSSLSocketFactory() throws NoSuchAlgorithmException, KeyManagementException, KeyStoreException,
        UnrecoverableKeyException {
    super(trustStrategy, hostnameVerifier);
}

private static final X509HostnameVerifier hostnameVerifier = new X509HostnameVerifier() {
    @Override
    public void verify(String host, SSLSocket ssl) throws IOException {
        // Do nothing
    }

    @Override
    public void verify(String host, String[] cns, String[] subjectAlts) throws SSLException {
        // Do nothing
    }

    @Override
    public boolean verify(String s, SSLSession sslSession) {
        return true;
    }

    @Override
    public void verify(String arg0, java.security.cert.X509Certificate arg1) throws SSLException {
        // TODO Auto-generated method stub

    }
};

private static final TrustStrategy trustStrategy = new TrustStrategy() {

    @Override
    public boolean isTrusted(java.security.cert.X509Certificate[] arg0, String arg1) throws CertificateException {
        return true;
    }
};

}

As you can see, I am only copying code from the official documentation and the link you have posted, but I hope that seeing all together would be clearer for you.

King Midas
  • 1,442
  • 4
  • 29
  • 50
  • where do i put these classes? – clever_bassi Aug 25 '14 at 17:38
  • This classes must be in your own project. Controller will be you main application class that will be executed when executing your application. The other class can be in any place of your project, the only restriction is that must be accessible by the Controller. You can copy these classes as is in your project. – King Midas Aug 26 '14 at 08:01