6

In a busy repository, I can foresee some files with too much central logic in one place being edited constantly.

Is there any way to find such files by asking Mercurial, either through bundled extensions, 3rd party extensions, or any external tools?

Basically I'd like some statistics that shows files that are edited the most over time, so that I can use this to find candidates for splitting, like refactoring the code into multiple files, to avoid having constant merge pain for single files.

I'm aware of the churn extension, but it seems to only focus on how much each author does to the repository, not what the authors do it to.

Tomerikoo
  • 18,379
  • 16
  • 47
  • 61
Lasse V. Karlsen
  • 380,855
  • 102
  • 628
  • 825

3 Answers3

4

I don't think any of the churn, activity, or chart extensions does exactly that, though they're all a simple tweak away from it I think (they group by user not by file).

You could use a loop like:

for therev in $(seq 1 $(hg id -n -r tip)) ; do
  hg diff --change $therev --stats
done

And then total by file.

Ry4an Brase
  • 78,112
  • 7
  • 148
  • 169
  • Suffice to say that your answer, and a bit of digging into the output and usage of the command line client made me start up a C# project for creating a wrapper of the command line for usage in .NET. A statistics package is going to be one of the outcomes of this. My class library can be found here: http://bitbucket.org/lassevk/mercurial.net - **Thanks for the inspiration!** – Lasse V. Karlsen Nov 11 '10 at 10:07
  • The class library has long since moved to codeplex: http://mercurialnet.codeplex.com/ - just editing this since I got a vote on the question today, so at least it garnered some attention. – Lasse V. Karlsen Mar 01 '11 at 17:00
  • Normally more C# code in the world would make me sad, but if it's solidifying Mercurial's place over git in the Csharpnetdot community so much the better! – Ry4an Brase Mar 02 '11 at 03:56
1

Based on Ry4an I created the following powershell script:

It ignores changesets that contains the word 'merge' in the first description line. A CSV file is generated. I open this in excel and pivot the table to aggregate the changes per file.

$revisions = @{};

function GetFileChanges([int] $revision){
    try{
        $logDescription = hg log -r $revision --template '{desc|firstline}'        
        if ($logDescription.ToLower().Contains("merge")){
            write-output "Skipping merges " $logDescription
        } else {
            $fileChanges = hg diff --change $revision --stat  
            $fileModifications = @{};
            foreach($fileChange in $fileChanges){
                if ($fileChange){ #when you have a branch operation, no files are changed. 
                    $fileLineDetail = $fileChange.split('|');
                    $changes = select-string -InputObject $fileLineDetail[1] -pattern '(\d+)' |  % { $_.Matches } | % { $_.Value }      
                    if ($changes){         
                        $fileModifications.Add($fileLineDetail[0].trim(), [int] $changes);                 
                    }
                }
            }
            $revisions.Add($revision, $fileModifications);
        }       
    }
    catch [exception]
    {
        "caught an exception"
        write-error $revision
    }

}

$previous = hg identify -r build-3.4.139.0 -n
$now = hg identify -r tip -n
for($i = [int] $previous; $i -le [int] $now; $i++){
    GetFileChanges($i);
}

# hg diff -r 3610:tip --stat 

$exportTable = @();

foreach($key in $revisions.Keys){

  $revision2= $revisions[$key];
  foreach($file in $revision2.Keys){

     $tempreport = New-Object PSObject
     $tempreport | Add-Member -type NoteProperty -Name Revision -Value $key
     $tempreport | Add-Member -type NoteProperty -Name File -Value $file
     $tempreport | Add-Member -type NoteProperty -Name Changes -Value $revisions[$key][$file]
     $exportTable += $tempreport;
  }

}

$exportTable | export-csv "stats.csv" -noType 
Sentient
  • 2,185
  • 2
  • 19
  • 20
0

This is my take on "give me the 10 most modified files in the project's code base":

for f in `find . -name '*.java'`; do c=`hg log $f | grep changeset | wc -l`; echo "$c $f" ; done | sort -n | tail -n 10

It takes a while to run (on a non-SSD disk, anyway), but it works perfectly.

For those who would like a walk-through, I retrieve a list of all Java source files under the current dir, retrieve and count Hg log entries for that file, output the number of log entries together with the file name, sort by changeset count and filter out everything but the 10 most modified files.

The approach could easily be modified to include files of a different type, a different SCM system, a specific date range etc. Bash and Hg at their finest. ;)

Tomislav Nakic-Alfirevic
  • 10,017
  • 5
  • 38
  • 51