-1

I need to copy all HTML code on the page.

I do so:

URL url = new URL(testurl);
URLConnection connection = url.openConnection();
connection.connect();
Scanner in = new Scanner(connection.getInputStream());
  while(in.hasNextLine()) 
   {
     htmlText=htmlText+in.nextLine(); 
    }
   in.close();

But if the page is large, it takes a lot of time.

Is there a faster method?

HelloWorld
  • 47
  • 1
  • 6

2 Answers2

0

Have you tried a different method of reading the page? Like a buffered reader? Reading the content of web page or Reading entire html file to String?

I'm just thinking Scanner may be a little slow.

Tim

Community
  • 1
  • 1
0

Try to use use (http://jsoup.org "JSoup") to download and parse the HTML from URL

You can get the HTML as document and read the text on each elements

 new AsyncTask<Void, Integer, String>(){
    @Override
    protected String doInBackground(Void... params) {
        try {
            final Document doc = Jsoup.connect("http://youturl.com").get();
            final String content;
            runOnUiThread(new Runnable() {
                @Override
                public void run() {
                    // get the required text 
                   content = doc.body().getElementsByTag("bodyTag").text();

                }
            });

        } catch (IOException e) {
            e.printStackTrace();
        }
        return content;
    }
}.execute();
Libin
  • 16,967
  • 7
  • 61
  • 83