2

Using, pygit2, I can get the total number of files changed, total insertions, total deletions, and the relative path of the files. See below code. However, I can't find a way to get the stats on lines changed for each modified files like git diff --stats shows. Can I do it in pygit2?

def get_diff_stats(repo_path, commit_a, commit_b):
    repo = Repository("{}/.git".format(repo_path))
    diff = repo.diff(commit_a, commit_b)
    print(diff.stats.files_changed)
    print(diff.stats.insertions)
    print(diff.stats.deletions)

    for delta in diff.deltas:
        print(delta.new_file.path, delta.status_char())
Nasif Imtiaz Ohi
  • 1,563
  • 5
  • 24
  • 45

1 Answers1

0

Not entirely sure if you still care but I found your question and needed the answer myself. Ended up figuring it out by reading some rather dry documentation.

from pygit2 import init_repository, Patch
from colorama import Fore


git_repo = init_repository(repo_path)
diff = git_repo.diff(commit_a, commit_b, context_lines=0, interhunk_lines=0)

# A diff contains Patches or Diffs. We care about patches (changes in a file)
for obj in diff:
    if type(obj) == Patch:
        print(f"Found a patch for file {obj.delta.new_file.path}")
        
        # A hunk is the change in a file, plus some lines surounding the change. This allows merging etc. in Git. 
        # https://www.gnu.org/software/diffutils/manual/html_node/Hunks.html
        for hunk in obj.hunks:
          for line in hunk.lines:
            # The new_lineno represents the new location of the line after the patch. If it's -1, the line has been deleted.
            if line.new_lineno == -1: 
                print(f"[{Fore.RED}removal line {line.old_lineno}{Fore.RESET}] {line.content.strip()}")
            # Similarly, if a line did not previously have a place in the file, it's been added fresh. 
            if line.old_lineno == -1: 
                print(f"[{Fore.GREEN}addition line {line.new_lineno}{Fore.RESET}] {line.content.strip()}")  

As you can see, a diff can contain multiple Patches and Diffs. Because of this, we need to loop over them. The Diff object behaves as a collection (this is not really clear from the documentation). Patches contain the info we need. The actual lines changed can be found in the Hunks. This is a term from the GNU diff utils documentation and describes the changes + some context.

Mies van der Lippe
  • 492
  • 1
  • 3
  • 13