AnyLogic: Improving computational performance of a network model

Question

I'm working with an agent-based model of an epidemic. The idea is that individual agents make decisions based on what they observe in their networks (distance-based). I have several functions within each agent that dynamically update counts of infected contacts, contacts showing a particular behaviour etc.

The code below is for counting infected contacts within an agent's network.

int infectedConnections = 0;

if (getConnections() != null)
    for (Agent a : this.getConnections())
        {
        Person p = (Person) a;

        if (p.IsCurrentlyInfected()) 
            infectedConnections++;
            }

return infectedConnections ;

There will be at least 3 more such functions that keep counts of other agents expressing other features within an agent's network. Now, this seems to run Okay when I have <500 agents, but when I increase the agent population to about 1,000 or so, the model becomes extremely slow. I'm looking to simulate at least 5,000 agents, and at this point, the model doesn't even initialise.

Is there a more computationally efficient way to track network statistics in Anylogic for larger populations?

You could do with a better title for this question; it's very general/vague at the moment. — Stuart Rossiter, Nov 15 '17 at 22:28

score 2 · Answer 1 · answered Nov 15 '17 at 15:45

The model does not initialize since default memory amount is not enough for 5000 agents. It takes >1.300 Mb of RAM in case if each agent is connected with all other agents (4999 connections per each agent), while default simulation experiment allocates just 512 Mb of RAM. Change the memory amount in the experiment properties. Then, the code takes about 1 real sec for all 5000 agents. In other words, if I collect statistics every second, maximal execution speed is about 1 model sec per 1 real sec.

You can increase it if rewrite the code with Java Stream API: return (int)getConnections().stream() .filter( a -> (Person)a).IsCurrentlyInfected()) .count();

Then, then 1 model sec is executed in 0.5 real sec (x2 gain). In case if statistics collection is performed in parallel (with multiple threads, created by Java code), then you may get the respective gain, depending on number of cores at PC. Anyway, this is computational complexity issue, so you need to change the approach (see @pjs answer), otherwise performance is really poor.

Nice concrete answer regarding the initialization issue. – pjs Nov 15 '17 at 16:54 — pjs, Nov 15 '17 at 16:54

score 1 · Accepted Answer · answered Nov 15 '17 at 00:55

Your result that things bog down somewhere between 1000 and 5000 is pretty common with the agent-based models I've seen. It's a basic computational complexity issue. With N agents, the number of 2-way interactions is N.choose.2, which is O(N^2). 5000 agents is approximately 25 times as much work as 1000 agents.

You can pull some stunts with localization. Basically, divide your sandbox into different playing areas based on the fact that agents in a particular area can't interact with agents in other areas, so you only need to check for a subset of the interactions. Dividing the N agents into k independent groupings, if possible, will yield an O(k)-fold improvement in run times.

Another alternative might be to move away from a time-step framework and work out an event-based design for your problem. You can find an example of this approach in this paper.

Stuart Rossiter · Answer 3 · 2021-02-23T15:53:09.667

As the other answers cover, your question is really two questions:

the memory usage and 'underlying' model speed due to non-linearly-increasing total network connections as the number of agents increases (since each agent is connected to every other agent);
your stats gathering efficiency.

I'm surprised no-one mentioned it, but the main performance issue with the latter is because (it seems) you are re-calculating the statistic whenever it is needed (and you don't specify how often the calculation is needed) rather than just maintaining the counts as states change that affect them.

(This is a general programming trade-off of (a) minimising memory and avoiding potential bugs in not updating counts at all the appropriate times [as in your approach] vs. (b) speed by retaining counts and only updating them when events they depend on occur.)

So just have each agent update the count in its connected agents whenever it changes from non-infected to infected or vice versa.

As an illustration, imagine you have 10 agents, and so 9 connections per agent (90 connections total). Let's say an agent changes to/from infected every 10 simulated minutes (on average) and you run for 60 mins. And that you update the 'infected connections count' in each agent every minute. (If you were being efficient, this interval would be the minimum period between possible transitions but it may well be that you are querying it much more frequently than that, or that the minimum period is very small.)

With your method, you will check 90 connections 60 times (so 5400 accesses to agents, as well as the overhead of getting/looping through the connections for each agent).

With my method, there will be 6x10 = 60 relevant transitions, and so 60x9 = 540 accesses to agents (to increment/decrement counts), plus only getting/looping through an agent's connections 60 times instead of 60x10 = 600 times. So a >10x performance improvement.

The efficiency obviously improves the 'rarer' transitions are compared to how often you need to use the statistic (and vice versa) so, in some contexts, the two methods will be similar performance.

It results the same number of operations overall, so it may make execution smoother, but the total time is the same. — Gregory Monahov, Nov 16 '17 at 08:50
@GregoryMonahov How can it be the same? From what he wrote, it seems he has *extra* daily events doing all the looping to calculate the daily counts (instead of just updating them as the changes occur with no extra looping). Are we seeing the problem differently? — Stuart Rossiter, Nov 16 '17 at 13:48
if I understand your suggestion correctly, each agent should notify another connected agents about changes of Infectious flag, so they will know, how many infectious connected agents does they have. Imaginge they update them at the same time (it is part of model logic, not part of statistics collection). Number of operations is the same in case of sent "notifications" to each connected agents at infectious update, and for the current calculation per each agent. — Gregory Monahov, Nov 16 '17 at 14:54
@GregoryMonahov No, because he is updating the counts every time he needs them rather than just every time they change. See clarified answer. (Only 3 years later!) — Stuart Rossiter, Feb 23 '21 at 15:53

AnyLogic: Improving computational performance of a network model

3 Answers3