3

For my app (supporting Android 2.2+) I have to check HTML-code of a lot (approx 700) of different web-pages and retrieve a single name from each web-page. I have all the URL's stored in an array.

I now use a single Asynctask and iterate over the array with URLs like this:

(snippet from Asynctask's doinbackground)

publishProgress(urls.size());
int a = 0;
for(String code : urls) {
    if(!running) return null;
    try {
    URL url = new URL(code);
    HttpURLConnection con = (HttpURLConnection) url.openConnection();
    naam_codes.put(readStream(con.getInputStream(), true).get(0), code);
      } catch (Exception e) {
        running = false;
      }
    publishProgress(++a);

and readstream being:

BufferedReader reader = null;
      ArrayList<String> html = new ArrayList<String>();
      try {
          reader = new BufferedReader(new InputStreamReader(in, Charset.forName("ISO-8859-1")));
          if (snel){
              //reading, matching and stuff
          }
          else {
              //other reading, matching and stuff
            }
          }       
      } catch (IOException e) {
        //pass
      } finally {
         if (reader != null) {
             try {
                 reader.close();
             } catch (IOException e) {
                 return null;
             }
         }
      }
      return html;

Now my problem is that it has to wait for one download+matching to finish before starting with a new one. It should be possible to speed this up, right? After monitoring for a bit the process doesn't seem to fully use the CPU nor internet-bandwidth(?). Should I instead of iterating inside one Asynctask, iterate on the UI-thread and execute multiple Asynctasks? If so, how?

XorJoep
  • 65
  • 7

1 Answers1

2

Multiple AsyncTasks won't take advantage of multiple cores before API 11. After that, you can create one AsyncTask per download/parsing and have them executed parralelly using the executeOnExecutor function with the parameter AsyncTask.THREAD_POOL_EXECUTOR.

From the documentation:

Order of execution

When first introduced, AsyncTasks were executed serially on a single background thread. Starting with DONUT, this was changed to a pool of threads allowing multiple tasks to operate in parallel. Starting with HONEYCOMB, tasks are executed on a single thread to avoid common application errors caused by parallel execution.

If you truly want parallel execution, you can invoke executeOnExecutor(java.util.concurrent.Executor, Object[]) with THREAD_POOL_EXECUTOR.


If I were you, I would build my own server (Just a CRON task launching a PHP script somewhere + a MySQL database + a PHP script to serve your data) and I would not let the applications do the processing.

Let your server do the 700 downlaods, parse them, store what you need in a database. And then let your applications access your server script which will pick the required info from your database.

Advantages:

  • Your server has better bandwidth
  • It has more processing power
  • Your apps can request whatever data they need instead of downloading & parsing several hundreds of pages.

Inconvenient:

  • You may induce a little delay in making new data available (depends on your CRON task's execution period & execution time to update the database)
Vincent Mimoun-Prat
  • 28,208
  • 16
  • 81
  • 124
  • Thanks for you answer, I will start by trying your first suggestion. If that does not satisfy me I'll try to build my own server, which will take way more time since I've never done any SQL or PHP, just as I never made an app and did any Java before this (my first) app. – XorJoep Oct 21 '12 at 10:31