The benefit of using the Bayesian network is exactly so that we can use the chain rule. This network can be thought of as representing a huge lookup table that tells you the probability of all possible joint events that the network represents. It is because some events are conditionally independent of other events that we don't need to store this huge lookup table but can distribute it to the node level on the network.
If you consider the nodes of a Bayesian network to be stored as a probability lookup table (i.e., storing the probability of observing this event, represented by the node, given the possible values for its parent nodes), this table is fairly small in comparison to the size of the network as a whole. The entire network then just consists of these small tables that are linked by the parent-child relationships. When you perform a calculation to obtain a joint probability (i.e., P(A_1 ... A_n) from above) you can efficiently iterate (using the chain rule) to calculate the probability of seeing the observation given the structure of the network.
Note that it is the structure of the network that provides this saving. In your example, the "parents(A_1)" clause is just a subset of the entire set of nodes. The structure implicitly tells us that A_1 is conditionally independent of the other nodes in the network, given the values of its parents. So we only apply the chain rule to a small set of nodes that can effect the node in question.
This small amount of computation is generally just a fraction of the huge space saving that you obtain by using this structure.