when I was dealing with this I researched into loop detection using matrix methods etc.
https://en.wikipedia.org/wiki/Adjacency_matrix#Matrix_powers
But I found the best way to feedforward inputs and get outputs was with loop detection using a timeout propagation delay at each node:
a feedforward implementation is simple and I started from there:
wait until all incoming connections to a node have a signal then sum-squash activate and send to all output connections of that node. Start from input nodes that already have a signal from the input vector. Manually 'shunt' output nodes with a sum-squash operation once there are no more nodes to be processed to get the output vector.
for circularity (traditional NEAT implementation) I did the same as feedforward with one more feature:
calculate the 'maximum possible loop size' of the network. an easy way to calculate this is ~2*(total number of nodes). No walk from input to any node in the network is larger than this without cycling, therefore the node MUST propagate in this many time steps unless it is part of a cycle.
Then I wait until all input connection signals arrive at a node OR timeout occurs (signal has not arrived at a connection within maximum loop size steps). If timeout occurs label the input connections that don't have signals as recurrent.
Once a connection is labelled recurrent, restart all timers on all nodes (to prevent a node later in the detected cycle from being labelled recurrent due to propagation latency)
Now forward propagation is the same as feed forward network except: don't wait for connections that are recurrent, sum-squash as soon as all non-recurrent connections have arrived (0 for recurrent connections that don't have a signal yet). This ensures that the first node reached in a cycle is set to recurrent, making it deterministic for any given topology and recurrent connections pass data to the next propagation time step.
This has some first time overhead but is concise and produces the same results with a given topology each time its ran. Note that this only works when all nodes have a path to output so you cant necessarily disable split connections (connections that were made from node addition operations) and prune randomly during evolution without making considerations.
(P.S. This also creates a traditional residual-recurrent network that in theory could be implemented as matrix operations trivially. If I had large networks I would first 'express' by running forward propagation once to get recurrent connections then create a 'tensor per layer' representation for matrix-multiplication operations using recurrent, weight, and signal connection attributes with recurrent connection attribute as a sparse binary mask. I actually started writing a Tensorflow implementation that performed all mutation/augmentation operations with tf.sparse_matrix operations and didn't use any tree objects but I had to use dense operations and the n^2 space consumed is too much for what I need but this allowed the use of the aforementioned adjacency matrix powers trick since in matrix form! At least one other person on Github has done tf NEAT but I'm unsure of their implementation. Also I found this interesting https://neat-python.readthedocs.io/en/latest/neat_overview.html)
Happy Hacking!