5

I called the jgit log command and got back some RevCommit objects. I can get back some basic information from it and I use the following code to get the list of files that changed. There are two more things that I need though:

1) how do I get the information below when the commit doesn't have a parent?

2) how do I get the diff of the content that changed in each file

RevCommit commit = null;

RevWalk rw = new RevWalk(repository);

RevCommit parent = null;
if (commit.getParent(0) != null) {
   parent = rw.parseCommit(commit.getParent(0).getId());
}

DiffFormatter df = new DiffFormatter(DisabledOutputStream.INSTANCE);
df.setRepository(repository);
df.setDiffComparator(RawTextComparator.DEFAULT);
df.setDetectRenames(true);

List<DiffEntry> diffs = df.scan(parent.getTree(), commit.getTree());
for (DiffEntry diff : diffs) {
   System.out.println(getCommitMessage());

   System.out.println("changeType=" + diff.getChangeType().name()
           + " newMode=" + diff.getNewMode().getBits()
           + " newPath=" + diff.getNewPath()
           + " id=" + getHash());
}
Rüdiger Herrmann
  • 20,512
  • 11
  • 62
  • 79
Coder
  • 1,375
  • 2
  • 20
  • 45

1 Answers1

7

1) Use the overloaded scan method with AbstractTreeIterator and call it as follows in case of no parent:

df.scan(new EmptyTreeIterator(),
        new CanonicalTreeParser(null, rw.getObjectReader(), commit.getTree());

The case of no parent is only for the initial commit, in which case the "before" of the diff is empty.

2) If you want git diff style output, use the following:

df.format(diff);

And the diff will be written to the output stream passed to the constructor. So to get each diff individually, it should be possible to use one DiffFormatter instance per file. Or you could use one DiffFormatter with a ByteArrayOutputStream, get the contents and reset it before formatting the next file. Roughly like this:

ByteArrayOutputStream out = new ByteArrayOutputStream();
DiffFormatter df = new DiffFormatter(out);
// ...
for (DiffEntry diff : diffs) {
    df.format(diff);
    String diffText = out.toString("UTF-8");
    // use diffText
    out.reset();
}

Note that Git does not know the encoding of files, this has to be specified in the toString() method. Here it uses a reasonable default. Depending on your use case you may be better off not decoding it by using toByteArray().


A general note: Ensure that you always release the resources of RevWalk and DiffFormat using release().

robinst
  • 30,027
  • 10
  • 102
  • 108
  • Thank you for your reply, I will give this a shot when I get a chance. Do you mind elaborating more on your answer to number 2? I would like the standard diff output per file as given by a tool like git extensions or git k (just per file). Can you give some sample code on how I would do this? Thanks again. – Coder Sep 19 '12 at 17:40
  • Thank you very much for your solutions. I have tried them both and they work like a charm. My apologies for asking for more detail on number 2, I thought it was more complicated to implement. Also thank you for reminding me about calling release on the RevWalk. I was doing it in other code but forgot it here. Also thanks for letting me know about needing to release the DiffFormat. – Coder Sep 20 '12 at 10:40
  • Yeah that's exactly how i implemented it before your post lol. However it works for me without specifying the encoding in the toString(). Is this a coincidence? Is UTF-8 the default and for other encodings it would break? – Coder Sep 20 '12 at 11:03
  • The default is the "default encoding" (see [API](http://docs.oracle.com/javase/6/docs/api/java/io/ByteArrayOutputStream.html#toString%28%29)), which depends on many things. Inspect it with `Charset.defaultCharset()`. But I don't recommend relying on that, as the encoding of the files in the Git repository is not related to the encoding of the JVM environment. And it could be different from repository to repository or even from file to file. In any case, `UTF-8` is a better guess than the default encoding. – robinst Sep 20 '12 at 12:53
  • Thanks, I have added the encoding to the toString. So there is no way to find out what the encoding is for the repository? That's a weird limitation. But I guess it's weirder that i have to use a ByteArrayOutputStream to get the diff information. There should just be a method that returns a string lol. That's one of the weird things I found about JGit. It's not very user friendly and not documented well. A lot of methods should just return values that you get out of normal git commands instead of streams that you constantly have to parse. At least it works. Thanks again. – Coder Sep 20 '12 at 15:16
  • A Git repository can store arbitrarily encoded files. It's in general not known what the encoding of a text file is, it has to be guessed from the contents or is sometimes provided by external metadata (but there's no general solution for that). Therefore it's not possible for Git (or JGit) to know the encoding. _Please_ read [this article by Joel Spolsky about encodings](http://www.joelonsoftware.com/articles/Unicode.html), you will benefit from that knowledge sooner or later :-). – robinst Sep 20 '12 at 18:06