2

The problem with the code below is that I have to wait for all three tasks to finish.

If the 1st and 2nd tasks complete in 200ms and the 3rd completes in 2s then I will have to wait 2s before I load the next three URLs.

Ideally I would send a new request as soon as each task finishes and delay the main thread somehow until the ArrayList was empty.

In simple terms I would like each completable future to run in a kind of loop that is triggered by the old task completing.

(I do this quite often in JavaScript using events)

Can anybody think how I might achieve this?

    private static void httpClientExample(){

    ArrayList<String> urls = new ArrayList<>(
            Arrays.asList(
                    "https://www.bing.com/",
                    "https://openjdk.java.net/",
                    "https://openjdk.java.net/",
                    "https://google.com/",
                    "https://github.com/",
                    "https://stackoverflow.com/"
            ));

    HttpClient httpClient = HttpClient.newHttpClient();

    var task1 = httpClient.sendAsync(HttpRequest.newBuilder()
            .uri(URI.create(urls.get(0)))
            .build(), HttpResponse.BodyHandlers.ofString())
            .thenApply(HttpResponse::uri).thenAccept(System.out::println);

    var task2 = httpClient.sendAsync(HttpRequest.newBuilder()
            .uri(URI.create(urls.get(1)))
            .build(), HttpResponse.BodyHandlers.ofString())
            .thenApply(HttpResponse::uri).thenAccept(System.out::println);

    var task3 = httpClient.sendAsync(HttpRequest.newBuilder()
            .uri(URI.create(urls.get(2)))
            .build(), HttpResponse.BodyHandlers.ofString())
            .thenApply(HttpResponse::uri).thenAccept(System.out::println);

    // All tasks have to complete
    var all = CompletableFuture.allOf(task1, task2, task3).join();
    
    // Get the next 3 URLs

    System.out.println("Main Thread Completed");
}
Adrian Smith
  • 1,013
  • 1
  • 13
  • 21
  • 1
    Is there any reason why you want to wait for all 3 requests to finish before continuing? If the only requirement is to make at most 3 parallel requests you could use an `ExecutorService` with at most 3 threads. – dpr Oct 21 '20 at 19:32
  • @dpr No reason at all. I just wanted to make sure that all three tasks completed before the main thread dropped out. – Adrian Smith Oct 22 '20 at 07:52
  • You mean all 6 (number of URLs) tasks? – dpr Oct 22 '20 at 08:03
  • I updated my answer. It became way simpler without the "max 3 calls in parallel" requirement. – dpr Oct 22 '20 at 08:15

2 Answers2

4

Letting the job itself remove another pending URL and submit it, would require a thread safe queue.

It might be easier to let the main thread do it, e.g. like

var httpClient = HttpClient.newHttpClient();
var pending = new ArrayDeque<CompletableFuture<?>>(3);
for(String url: urls) {
    while(pending.size() >= 3 && !pending.removeIf(CompletableFuture::isDone))
        CompletableFuture.anyOf(pending.toArray(CompletableFuture<?>[]::new)).join();

    pending.addLast(httpClient.sendAsync(HttpRequest.newBuilder()
            .uri(URI.create(url))
            .build(), HttpResponse.BodyHandlers.ofString())
            .thenApply(HttpResponse::uri).thenAccept(System.out::println));
}
CompletableFuture.allOf(pending.toArray(CompletableFuture<?>[]::new)).join();

This will wait until at least one of the three submitted jobs has completed (using anyOf/join) before submitting the next one. When the loop ends, there might be up to three still running jobs. The subsequent allOf/join after the loop will wait for the completion of those jobs, so all jobs have been completed afterwards. When you want the initiator thread to proceed when it is known that all jobs have been submitted, without waiting for their completion, just remove the last statement.

Eugene
  • 117,005
  • 15
  • 201
  • 306
Holger
  • 285,553
  • 42
  • 434
  • 765
  • Many thanks for your excellent solution. I have tested it and it works well. I like how your solution prunes out all of the completed tasks. It seems possible though that the ArrayDeque could fill up and overflow before the tasks complete (http request returns). I'm trying to build a crawler that will handle upwards of 1,000,000 urls. Can you think of a simple way that I might modify your solution so that it chunks the data so that I can store it away somewhere before the next chunk starts. Say 100 URLs at a time? – Adrian Smith Oct 23 '20 at 09:15
  • 2
    The queue can not overflow, as the loop waits for the completion of at least one job before putting another one. That’s what the `anyOf`/`join` does in the `while(pending.size() >= 3 …` loop. You only have to adapt the number if you want to allow more at a time. The other `3` in the `new ArrayDeque>(3)` is only an optimization, to match the initial capacity with the usage. Still, for production code it’s worth fusing both numbers into a named constant (or parameter). – Holger Oct 23 '20 at 09:19
  • Sorry, I missed that. A very elegant solution. Many thanks for your help. Kind Regards. – Adrian Smith Oct 23 '20 at 09:28
1

If you don't have a requirement on the maximum amount of parallel calls things become a lot easier:

private static void httpClientExample() throws Exception {

  final ArrayList<String> urls = ...; //list of urls 

  final HttpClient httpClient = HttpClient.newBuilder().executor(
                                    Executors.newFixedThreadPool(10)).build();

  final List<CompletableFuture<Void>> allFutures = new ArrayList<>();
  for (String url : urls) {
    final CompletableFuture<Void> completableFuture = httpClient
        .sendAsync(HttpRequest.newBuilder().uri(URI.create(url)).build(),
            HttpResponse.BodyHandlers.ofString())
        .thenApply(HttpResponse::uri).thenAccept(System.out::println);
    allFutures.add(completableFuture);
  }

  CompletableFuture.allOf(allFutures.toArray(CompletableFuture[]::new)).get();
}
dpr
  • 10,591
  • 3
  • 41
  • 71
  • 2
    When you use `executorService = Executors.newFixedThreadPool(3);` anyway, it might be easier to use `executorService.invokeAll(…)`, as the `Callable` interface allows checked exceptions. – Holger Oct 21 '20 at 19:55
  • 1
    Yes the handling of checked exceptions is a sad thing in the daily use of lambdas in Java. With `ExecutorService.submit` or `invokeAll` I‘ll only get `List` and these are quite bulky to combine... – dpr Oct 21 '20 at 20:21
  • 2
    Yes, but in this specific case, you don’t need to combine them, as the result is not used. The OP only used the futures to wait for the completion, but this is already done by `invokeAll` itself. – Holger Oct 22 '20 at 07:16
  • 1
    Indeed, I thought `invokeAll` would only submit a list of callables instead of waiting for their completion. – dpr Oct 22 '20 at 08:03
  • 1
    @AdrianSmith Wow, going from 6 to 1,000,000! No, If you think the number of urls might become significantly large (> 1k) I think its no good idea to put all the futures into one datastructure. I'd not even put all the urls into one array in the first place... – dpr Oct 23 '20 at 08:57
  • @dpr Many thanks for your excellent solution. Would this solution work with 1,000,000 urls though? Couldn't the allFutures ArrayList overflow? – Adrian Smith Oct 23 '20 at 09:04
  • @dpr Thanks for you help. I really appreciate it. Kind Regards, Adrian – Adrian Smith Oct 23 '20 at 09:07
  • there is a small problem with this approach, since you are not specifying an executor for the `HttpClient` and by default it will do : `if (ex == null) { ex = Executors.newCachedThreadPool(new DefaultThreadFactory(id));` ; i.e. : it will try to create a new thread for each request if one is not available. Just try to change your code to : `ArrayList urls = new ArrayList<>( Collections.nCopies(1500, "https://www.google.com/"));` depending on the java version, it might hang or even die with an `OOM`, or fail in some mysterious way. – Eugene Oct 31 '20 at 00:25
  • @Eugene thanks for pointing this out. Your point is totally valid and I changed the answer accordingly. As already mentioned my answer is not suitable when it comes to processing of a large number of requests. However I think choosing an unlimited thread pool as default for HttpClient is an "interesting" choice... – dpr Nov 02 '20 at 10:07
  • @dpr in the early versions it was even _documented_ to use such a pool, but that was removed, hinting that the implementation will change also in the nearest future... – Eugene Nov 02 '20 at 10:12
  • ...but now that you have set-up a pool and all you want to do is print the result, you do not need `final List> allFutures = new ArrayList<>();` and obviously you do not need `CompletableFuture.allOf(....)` anymore either. – Eugene Nov 02 '20 at 16:36
  • The initial question was "how to wait for a set of completable futures to finish". And the output of the url is just an example of a consumer for the request's result... – dpr Nov 02 '20 at 16:39