1

I am looking to crawl the entire website and save it locally offline. It should have 2 parts:

  1. Authentication

This needs to be implemented using Java and I need to override HttpsURLConnection logic to add couple lines of authentication (Hadoop) in order to fetch the url response (keytabs). Something like below:

     AuthenticatedURL.Token token = new AuthenticatedURL.Token();

    URL ur = new URL(url);
    //HttpsURLConnection.setDefaultHostnameVerifier(new     HostnameVerifierSSL());
    HttpsURLConnection con = (HttpsURLConnection) new AuthenticatedURL().openConnection(ur, token);
  1. Once all the links go through the above authentication, we need to crawl the entre website until depth =3 and save it locally offline as a zip.

Let me know possible solutions.

Spartan
  • 11
  • 2

0 Answers0