Fetching data simultaneously in Mojolicious

Question

I'm trying to run multiple subroutines in Parallel (to fetch data from external systems). To simulate, I use sleep in my example below. My question is: how can I achieve this in Mojolicious?

#!/usr/bin/perl
use Mojolicious::Lite;
use Benchmark qw(:hireswallclock);

sub  add1 { my $a = shift; sleep 1; return $a+1; }
sub mult2 { my $b = shift; sleep 1; return $b*2; }
sub power { my ($x, $y) = @_; sleep 1; return $x ** $y; }

any '/' => sub {    
    my ( $self ) = @_;

    my $n = int(rand(5));

    my $t0 = Benchmark->new;
    my $x = mult2($n); # Need to run in parallel
    my $y =  add1($n); # Need to run in parallel
    my $z = power($x,$y);
    my $t1 = Benchmark->new;
    my $t = timediff($t1,$t0)->real();

    $self->render(text => "n=$n, x=$x, y=$y, z=$z;<br>T=$t seconds");
};

app->start;

In other words, I'd like to reduce the time it takes to run down to 2 seconds (instead of 3) by running (add1 & mult2) in parallel.

This thread uses a Mojo::IOLoop->timer which doesn't seem relevant in my case? If so, I don't know how to use it. Thanks!

https://metacpan.org/pod/Mojo::IOLoop::ReadWriteFork – mpapec Apr 08 '18 at 08:40 — mpapec, Apr 08 '18 at 08:40

amon · Accepted Answer · 2018-04-08T17:48:56.087

To avoid long waiting times, you can use the Mojolicious non-blocking operations. Instead of running a synchronous request to an external system, use non-blocking methods that instead run some callback upon completion. E.g. to avoid a sleep, we would use Mojo::IOLoop->timer(...).

Here is a variant of your code that uses non-blocking operations, and uses Promises to properly sequence the functions:

use Mojolicious::Lite;
use Benchmark qw(:hireswallclock);

# example using non-blocking APIs
sub add1 {
    my ($a) = @_;
    my $promise = Mojo::Promise->new;
    Mojo::IOLoop->timer(1 => sub { $promise->resolve($a + 1) });
    return $promise;
}

# example using blocking APIs in a subprocess
sub mult2 {
    my ($b) = @_;
    my $promise = Mojo::Promise->new;
    Mojo::IOLoop->subprocess(
        sub {  # first callback is executed in subprocess
            sleep 1;
            return $b * 2;
        },
        sub {  # second callback resolves promise with subprocess result
            my ($self, $err, @result) = @_;
            return $promise->reject($err) if $err;
            $promise->resolve(@result);
        },
    );
    return $promise;
}

sub power {
    my ($x, $y) = @_;
    my $result = Mojo::Promise->new;
    Mojo::IOLoop->timer(1 => sub { $result->resolve($x ** $y) });
    return $result;
}

any '/' => sub {
    my ( $self ) = @_;

    my $n = int(rand(5));

    my $t0 = Benchmark->new;
    my $x_promise = mult2($n);
    my $y_promise = add1($n);
    my $z_promise = Mojo::Promise->all($x_promise, $y_promise)
        ->then(sub {
            my ($x, $y) = map { $_->[0] } @_;
            return power($x, $y);
        });
    Mojo::Promise->all($x_promise, $y_promise, $z_promise)
        ->then(sub {
            my ($x, $y, $z) = map { $_->[0] } @_;
            my $t1 = Benchmark->new;
            my $t = timediff($t1, $t0)->real();

            $self->render(text => "n=$n, x=$x, y=$y, z=$z;\nT=$t seconds\n");
        })
        ->wait;
};

app->start;

This is a lot more complex than your code, but completes within two seconds instead of three seconds, and does not block other requests that happen at the same time. So you could request this route thirty times at once, and get thirty responses two seconds later!

Note that Promise->all returns a promise with the values of all awaited promises, but puts the values of each promise into an array ref so we need to unpack those to get the actual values.

If you cannot use non-blocking operations, you can run the blocking code in a subprocess, using Mojo::IOLoop->subprocess(...). You can still coordinate the data flow through promises. E.g. see the above mult2 function for an example.

Thank you. I need to study your code further. I certainly do not have a `sleep` in my code. I was just trying to illustrate that (it takes time). I need to attempt your example code without `Mojo::IOLoop->timer`. I'll look into `Mojo::IOLoop->subprocess(...)` as well. — h q, Apr 08 '18 at 11:43
It would be useful to describe how to wait for a subprocess, to coordinate when to apply the power call. — DavidO, Apr 08 '18 at 17:09
@DavidO I updated with a subprocess example. You can keep using promises to run the code in the proper sequence. — amon, Apr 08 '18 at 17:50
@amon : Thanks for that. Given your example I took the additional step of divorcing the business logic code from the controller, so that the slow subs can live without modification, while still leveraging Mojo::IOLoop::subprocess and ::promise. See https://gist.github.com/daoswald/17c1c37de52c700d794dc867cae9ca49 — DavidO, Apr 09 '18 at 05:32
@DavidO Where possible, it's better to go fully async instead of trying to parallelize synchronous operations. E.g. forking a subprocess is comparatively expensive and requires you to serialize the return values. Some resources (often, database connections) cannot be reused in the subprocess. In your case, you are accidentally still calling the `power()` function synchronously which blocks all requests! So I think your approach is viable as a migration strategy to fully async code, but it doesn't quite get the full benefits yet. — amon, Apr 09 '18 at 09:06
Totally agree that purpose built asynchronous calls will be more ideal in several regards, and it was intentional to call power synchronously in my example. I just wanted to see what it would look like if we have to assume the OP's methods are untouchable. Thanks for your answer, it was insightful. — DavidO, Apr 09 '18 at 13:50
Thank you @amon for your insightful solution. Thanks DavidO for your example. — h q, Apr 20 '18 at 08:12

score 1 · Answer 2 · answered Apr 09 '18 at 22:27

Concurrent vs Parallel

In order for two events to occur simultaneously ( in parallel ) you require multiple processing units.

For instance, with a single CPU you are only able to carry out a single mathematical operation at any single time so they will have to run in series regardless of the concurrent themes in your code.

Non-mathematical operations such as input/output ( e.g. network, HDD ) can occur in parallel fashion as these are, in most part, independent of your single CPU (I'll leave multi-core systems out of the explanation as generally speaking Perl is not optimised for their use).

Hypnotoad, Mojolicious' production web-server, relies on proper implementation of non-blocking IO. As such they have provided a non-blocking user-agent as part of your controller.

$controller->ua->get(
    $the_url,
    sub {
        my ( $ua, $tx ) = @_;
        if ( my $result = $tx->success ) {
            # do stuff with the result
        }
        else {
            # throw error
        }
    }
);

You can implement Mojo::Promise here to improve the flow your code.

If possible, I would recommend implementing a non-blocking UA when fetching data from "external systems". If you find that your Hypnotoad worker processes are blocking for too long ( 5 seconds ) they are likely to be killed off and replaced which may disrupt your system.

Thank you so much. I agree with you, I won't be running mathematical operation, but fetch data from external systems. Would you be so kind to provide a working example? — h q, Apr 20 '18 at 07:32

Fetching data simultaneously in Mojolicious

2 Answers2