TLDR: I am interested in finding out whether there is a reason to select an additive approach to troubleshooting over a subtractive approach, or vice-versa when trying to troubleshoot a problem with many variables.
Problem overview:
I am working with a group of people trying to troubleshoot an intermittent but high impact issue in a staging system that is preventing us from going live with this new configuration.
We have a Citrix XenApp application server running on a virtualized infrastructure serving applications to clients running at remote sites over a WAN. There are several encryption/security/firewall devices at the head end of the network between the WAN and the physical server(s) hosting the virtualized servers.
So basically we have a problem with many variables and we are trying to troubleshoot it. So far we have started with a subtractive approach -- trying to remove one thing from the system at a time and trying to rule out that one thing if the problem goes away. We are not having much luck with this approach. I was thinking of suggesting an additive approach where we start out the bare minimum of system components that the app will work under, and start adding things in various combinations.
Based on your experience, are there reasons to prefer additive over subtractive troubleshooting or vice-versa?