8

In trying to get an overview of how difficult some legacy C++ and C# code is to maintain, and the risk of introducing bugs to it, it has been suggested that it would be useful to measure how widely or narrowly variables are scoped. The code uses a lot of globals, or widely scoped variables, where local ones would be better. A common occurrence is to find that these variables are used for 2 or 3 lines of code several scope levels in from where they are declared.

I know static code analysis tools usually try to quantify coupling and cohesion, but is there anything more specifically measuring variable/data scope?

Soner Gönül
  • 97,193
  • 102
  • 206
  • 364
krm
  • 81
  • 2

3 Answers3

2

Yes, that's a standard technique of static analysis. It's called analysis of variable liveness. In this book, the introduction example is doing such an analysis.

From the Wikipedia article about it:

In compiler theory, live variable analysis (or simply liveness analysis) is a classic data flow analysis performed by compilers to calculate for each program point the variables that may be potentially read before their next write, that is, the variables that are live at the exit from each program point.

Stated simply: a variable is live if it holds a value that may be needed in the future.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
gefei
  • 18,922
  • 9
  • 50
  • 67
  • "may be needed in the future" is not necessarily "is used in the future". Simple example: C-style for-loop `int i; for (i=0; i<10; ++i) meow(i);` after this loop, i is 10 and that value *may be needed*. In most cases, however, it is not needed/used. The compiler will consider it "live", but if I understand karm correctly, he wants a hint/warning that i's scope should be constrained into the for loop: `for (int i = 0; i<10; ++i) meow(i);` – Arne Mertz Jan 15 '13 at 11:08
  • 1
    @ArneMertz: it may be that "liveness analysis" is the wrong term for what I'm thinking of, but since it's based on data-flow analysis I believe that in your example `i` will be assessed as "dead" at the end of the loop provided that there are no reads from `i` between that point and either the first non-conditional write to `i` or the end of the scope of `i`, whichever comes first. "May be needed" takes into account (to the best of the analyzer's ability) the whole scope of the variable, not just the code preceding the program point at which its liveness is being assessed. – Steve Jessop Jan 15 '13 at 11:17
  • ... so for example if the next statement after the loop is `puts("foo");`, and `i` is dead, then the optimizer knows it doesn't need to preserve the value of `i` across the call, and maybe it can save some instructions and/or stack space as a consequence. That and register allocation are the uses of liveness I'm most familiar with, and I see gefei's point that it's useful for this question too. In your example, and assuming it's not used after the loop `i` is live *only* inside the loop (not before and not after), which is sufficient information to provoke the desired hint to reduce its scope. – Steve Jessop Jan 15 '13 at 11:22
  • That said, for another example `int i = 0; ...; for(;i < 10; ++i)` (where ... doesn't use `i`) it takes a little more that just the definition of liveness above to conclude that `i` can be reduced in scope. You also have to detect that the initialization of `i` can be delayed as late as its first read (accounting for branches that lead to conditional reads, of course, but this example doesn't have any of those). DFA typically does that too, but I don't know whether it's called "liveness" or something else. – Steve Jessop Jan 15 '13 at 11:27
0

I will concentrate on local variables in OO languages (Java, C#, C++). I can think of a number of measures concerning the scope of a local variable.

Local variable scope size
is the number of statements a local variable is accessible. This shouldn't be too big, as this indicates a too long method. However the method statement count might be a more adequate measurement for this.

Accessible local variable count
is the number of accessible local variables for each statement of a method. This shouldn't be larger than 3, as it makes the choice which local variable to use in an expression more difficult.

Local variable usage density
is the percentage of statements accessing a local variable versus statements where the local variable is accessible. Low values indicate that the method isn't much coherent.

Coherent modification of local variables count
is the number of modifications of local variables within the same block. This indicates that more than one local variables belong together. So they should form an object of their own thereby increasing coherence.

SpaceTrucker
  • 13,377
  • 6
  • 60
  • 99
  • Keeping the accessible local variable count no greater than 3 is going to be tricky, all three of the languages you cite have standard methods/functions with more than 3 parameters and so can't be implemented with that restriction. In C++ a parameter explicitly is a local variable, I'm not sure what the others say but parameters might as well be locals for the purpose of selecting which name you're supposed to use in a given expression. It's a noble goal, of course, and presumably you'd flag those standard functions dangerous if asked to review the design :-) – Steve Jessop Jan 15 '13 at 13:41
  • @SteveJessop I didn't meant them to be hard restrictions. But I have to admit that I didn't thought of parameters. So I guess 5 is a more appropriate value. – SpaceTrucker Jan 15 '13 at 13:51
  • I certainly agree that it's worth bearing in mind. In practice I would look at functions I like and functions I hate, and try to discern a guideline value from that. It might vary for different kinds of components since there are times when you're willing for the code to be a bit gnarly, and other times you want to keep everything very straightforward. – Steve Jessop Jan 15 '13 at 14:00
0

You can try CppDepend and it's CQLinq code query language to detect some global variables used by only one method or maybe one class.

from f in Fields where f.IsGlobal && f.MethodsUsingMe.Count()==1 select f