How to go about a large refactoring project?

Question

I am about to start planning a major refactoring of our codebase, and I would like to get some opinions and answers to some questions (I have seen quite a few discussions on similar topics, such as https://stackoverflow.com/questions/108141/how-do-i-work-effectively-with-very-messy-legacy-code, Strategy for large scale refactoring, but I have some specific questions (at the bottom):

We develop a complex application. There are some 25 developers working the codebase. Total man years put into the product to date are roughly 150. The current codebase is a single project, built with ant. The high level goal of the project I'm embarking on is to modularize the codebase into its various infrastructures and applicative components. There is currently no good separation between the various logical components, so it's clear that any modularization effort will need to include some API definitions and serious untangling to enable the separation. Quality standards are low - there are almost no tests, and definitely no tests running as part of the build process.

Another very important point is that this project needs to take place in parallel to active product development and versions being shipped to customers.

Goals of project:

allow reuse of components across different projects
separate application from infrastructure, and allow them to evolve independently
improve testability (by creating APIs)
simplify developers' dev env (less code checked out and compiled)

My thoughts and questions:

What are your thoughts regarding the project's goals? Anything you would change?
do you have experience with such projects? What would some recommendations?
I'm very concerned with the lack of tests - hence lack of control for me to know that the refactoring process is not breaking anything as i go. This is a catch 22, because one of the goals of this project is to make our code more testable...
I was very influenced by Michael Feathers' Working Effectively With Legacy Code . According to it, a bottom up approach is the way to solve my problem - don't jump head first into the codebase and try to fix it, but rather start small by adding unit tests around new code for several months, and see how the code (and team) become much better, to an extent where abstractions will emerge, APIs will surface, etc, and essentially - the modularization will start happening by itself. Does anyone have experience with such a direction? As seen in many other questions on this topic - the main problem here is managerial disbelief. "how is testing class by class (and spending a lot of time doing so) gonna bring us to a stable system? It's a nice theory which doesn't work in real life". Any tips on selling this?

score 15 · Answer 1 · edited Apr 03 '11 at 15:02

Well I guess it's better now than later but you've definitely got a task ahead of you. I was once in a team of three responsible for a refactoring a product of similar size. It was procedural code but I'll describe some of the issues we had that will similarly apply.

We started at the bottom and started easing into it by picking functions that should have been highly reusable but weren't. We'd write a bunch of unit tests on the existing code (none existed at all!), but before long, we faced our first big problem--the existing code had bugs that had been laying dormant.

Do we fix them? If we do, then we've gone beyond a refactoring. So we'd log an issue with the existing code hoping to get a fixed and freshly tested code base, but of course management decided there were more important priorities than fixing bugs that had never surfaced. Understandable.

So we thought we'd try fixing the bugs in our new code. Then we discovered that these bugs in the original code made other code work, so really were 'conceptual bugs' rather than 'functional bugs'. Well maybe. There were occasional intermittent spasms in the original software that had never been tracked down.

So then we changed tack and decided to keep the bugs in place, as a true refactoring should do. It's easy to unintentionally introduce bugs, it's far harder to do it intentionally!

The next problem was that the code was in such as mess that the initial unit tests we wrote had to substantially change to cater for the refactoring. In other words, two moving targets. Not good. Just writing the tests was taking ages and lost us belief in the worthiness of the project. It really was something you just wanted to walk away from.

We found in the end we really had to tone down the extent of the refactoring if we were going to finish this millennium, which meant the codebase we dreamed of wouldn't be achieved. We declared that the most feasible solution was just to clean and trim the code and at least make it conceptually easier to understand for future developers to modify.

The reduced benefits of the limited refactoring was deemed not worth the effort by management, and given that similar reliability issues were being found in the hardware platform (embedded project), the company decided it was their chance to renew the entire product, with the software written from scratch, new language, objects. It was only the extensive system test specs in place from the original product that meant this had a chance.

Ira Baxter · Answer 2 · 2011-04-04T06:12:25.283

Clearly the absence of tests is going to make people nervous when you attempt to refactor the code. Where will anybody get any faith that your refactoring doesn't break the application? Most of the answers you'll get, I think, will be "this is gonna be very hard and not very successful", and this is largely because you are facing a huge manual task and no faith in the answer.

There are only two ways out.

Build a bunch of tests. Unfortunately, this will cost a lot of time and most managers don't see any value; after all, you've gotten along without them so far. Pointing back to the faith question won't help; you're still using a lot of time before anything useful happens. If they do let you build tests, you'll have the problem of evolving the tests as you refactor; they may not change functionality one bit, but as you build new APIs the tests will have to change to match the new APIs. That's additional work beyond refactoring the code base.
Automate the refactoring process. If you apply trustworthy automated transformations, you can argue (often unsuccessfully) that the refactored code preserves the original system function. The way to beat the unsucessful argument is to write those tests (see first method) and apply the refactoring process to the application and the tests; as the application changes structures, the tests have to change too. But they are just application code from the point of view of automated machinery.

Not a lot of people do the latter; where do you get the tools that can do such things?

In fact, such tools exist. They are called program transformation tools and are used to carry out massive transformations on code. Think of these as tools for literally refactoring in the large; because of scale, they tend not to be interactive.

It does take effort to configure them for the task at hand; you have to write custom rules to accomplish your custom desired result. You likely can't do this in a week, but this is a lot less work than manually modifying a large system. And you should consider that you have 150 man-years invested in the existing software; it took that long to make the mess. It seems reasonable that "some" effort small in comparison should be OK.

I only know of 3 such tools that have a chance of working on real code: TXL, Stratego/XT, and our tool, the DMS Software Reengineering Toolkit. The first two are academic products (although TXL has been used for commercial activities in the past); DMS is commercial.

DMS has been used for a wide variety of large-scale software anaysis and massive transformation tasks. One task was automated translation between languages for the B-2 Stealth Bomber. Another, much closer to your refactoring problem, was automated architecting of a large-scale component-based system C++ for componentts, from a legacy proprietary RTOS with its idiosyncratic rules about how components are organized, to CORBA/RT in which the component APIs had to be changed from ad hoc structures to CORBA-style facet and receptacle interfaces as well as using CORBA/RT services in place of the legacy RTOS services. (These tasks were both done with 1-2 man-years of actual effort, by pretty smart and DMS-savvy guys).

There's still the test-construction problem (Both of these examples above had great system tests already).. Here I'm going to go out on a limb. I believe there is hope in getting such tools to automate test generation by instrumenting running code to collect function input-output results. We've built all kinds of instrumentation for source code (obviously you have to compile it after instrumentation) and think we know how to do this. YMMV.

There is something you do which is considerably less ambitious: identify the reusable parts of the code, by finding out what has been reused in the code. Most software systems contain a lot of cloned code (our experience is 10-20% [and I'm surprised by the PHP report of smaller numbers in another answer; I suspect they are using a weak clone detector). Cloned code is a hint of a missing abstraction in the application software. If you can find the clones and see how they vary, you can pretty easily see how to abstract them into functions (or whatever) to make them explicit and reusable.

Salion Inc. did clone detection and abstraction. The paper doesn't explore the abstraction activity; what Salion actually did was a periodic review of the detected clones, and manual remediation of the egregrious ones or those that made sense into (often library) methods. The net result was the code base actually shrank in size and the programmers became more effective because they had better ("more reusable") libraries.

They used our CloneDR, a tool for finding clones by using the program syntax as a guide. CloneDR finds exact clones and near misses (replacement of identifiers or statements) and provides a specific list of clone locatons and clone paramerizations, regardless of layout and comments. You can see clone reports for a number of languages at the link. (I'm the originator and author of CloneDR among my many hats).

Regarding the "small clone percentage" for the PHP project discussed in another answer: I don't know what was being used for a clone detector. The only clone detector focused on PHP that I know is PHPCPD, which IMHO is a terrible clone detector; it only finds exact clones if I understand the claimed implementation. See the PHP example at our site for comparative purposes.

score 2 · Answer 3 · edited Apr 03 '11 at 15:00

This is exactly what we've been doing for web2project for the past couple years.. we forked from an existing system (dotproject) that had terrible metrics like high cyclomatic complexity (low: 17, avg: 27, high: 195M), lots of duplicate code (8% of overall code), and zero tests.

Since the split, we've reduced duplicate code (2.1% overall), reduced the total code (200kloc to 155kloc), added nearly 500 unit tests, and improved cyclomatic complexity (low: 1, avg: 11, high: 145M). Yes, we still have a ways to go.

Our strategy is detailed in my slides here: http://caseysoftware.com/blog/phpbenelux-2011-recap - Project Triage & Recovery; and here: http://www.phparch.com/2010/11/codeworks-2010-slides/ - Unit Testing Strategies; and in various posts like this one: http://caseysoftware.com/blog/technical-debt-doesn039t-disappear

And just to warn you.. it's not fun at first. It can be fun and satisfying once your metrics start improving but that takes a while.

Good luck.

How to go about a large refactoring project?

3 Answers3