2

Although it is important to be able to optimize individual "sniff" implementations, my question isn't really about that.

What I really want to do is run my favorite dozen-or-so sniffs over my entire codebase (which contains several million lines of PHP code) in a couple of seconds.

Each of my favorite sniffs takes less than 2 seconds to run against each file in my codebase, so in principle there's no reason (assuming for the moment that I'm not using any "multi-file sniffs") that I couldn't do this by breaking the problem down into a few hundred thousand "jobs", distributing them across a (plentiful) sea of "workers" to run in parallel, and aggregating the results.

I'm hoping that somebody had already done this (or something like it) using one of the several popular frameworks for building massively scalable applications, and had some practical advice to share.

EDIT:

Speed actually matters to me because I want to use CodeSniffer to do some static analysis on the source code when "build"ing a software release, and I want the whole build process to run in minutes rather than hours (or even days). I appreciate that this is not the way that CodeSniffer was originally designed to be used (e.g. as an IDE plugin which can show you potential issues in your code changes before you commit them), but I find the flexibility of "sniffs" make CodeSniffer an ideal tool for developing static analysis applications.

Peter
  • 2,526
  • 1
  • 23
  • 32
  • 2
    What is your issue? If you sniff the whole-codebase why does speed matters? I mean when do you actually need to sniff the whole codebase? Maybe in the nightly build for completeness reasons? Otherwise I suggest you let do it your IDE in the background while you are in certain files. Then you normally don't need to care about speed because it's just doing in parallel - so you're not explicitly waiting for the outcome - the IDE just highlights the mistakes. Also you can sniff on commit only for changed files. Is some similar principle to distribute the work. – hakre Jan 04 '13 at 22:35
  • 1
    You might also want to try out a dev branch I've been working on to improve performance over large code bases, although it is much more about memory improvements. Still, they might come in handy for you as well: https://github.com/squizlabs/PHP_CodeSniffer/tree/report-memory-improvements – Greg Sherwood Jan 05 '13 at 00:19
  • @hakre - I have updated my original question with a note about why I care about speed. If I were using CodeSniffer in a more "standard" way then speed would indeed not be much of a concern - thanks for pointing this out. – Peter Jan 05 '13 at 01:20
  • @Greg Sherwood - Thanks for the link. Right now I'm mostly concerned about speed, but if the complexity of my sniffs increases then I expect memory will become critical. – Peter Jan 05 '13 at 01:22

1 Answers1

0

I, too, have been frustrated by the throughput of CodeSniffer. Best I could find was to remove sniffs that I wasn't concerned about checking. I also suspect that some sniffs are more expensive than others, depending on the amount of parsing that each has to do.

Andy Lester
  • 91,102
  • 13
  • 100
  • 152
  • 2
    It is absolutely true that different sniffs have different performance, both in terms of CPU and memory. I would have expected the time required to run a sniff to be roughly proportional to the number of "callbacks" generated while processing the corpus and to the complexity of the processing for a single callback. But so far my tests haven't yet shown the expected linear relationships. So when choosing between different sniff implementation candidates there's no way around running tests. But the question is not so much about tuning the sniffs as scaling the framework for running them. – Peter Jan 05 '13 at 01:31