Although it is important to be able to optimize individual "sniff" implementations, my question isn't really about that.
What I really want to do is run my favorite dozen-or-so sniffs over my entire codebase (which contains several million lines of PHP code) in a couple of seconds.
Each of my favorite sniffs takes less than 2 seconds to run against each file in my codebase, so in principle there's no reason (assuming for the moment that I'm not using any "multi-file sniffs") that I couldn't do this by breaking the problem down into a few hundred thousand "jobs", distributing them across a (plentiful) sea of "workers" to run in parallel, and aggregating the results.
I'm hoping that somebody had already done this (or something like it) using one of the several popular frameworks for building massively scalable applications, and had some practical advice to share.
EDIT:
Speed actually matters to me because I want to use CodeSniffer to do some static analysis on the source code when "build"ing a software release, and I want the whole build process to run in minutes rather than hours (or even days). I appreciate that this is not the way that CodeSniffer was originally designed to be used (e.g. as an IDE plugin which can show you potential issues in your code changes before you commit them), but I find the flexibility of "sniffs" make CodeSniffer an ideal tool for developing static analysis applications.