0

In a commit based research for Software Engineering, the first task I've been asked to do is to link every commit to a version. So, per say, every commit that happened between version 1.1 to 1.2 would be seen as 1.1.

Now, to do so, I got all tags from the repo (they are multiple, but let's focus on Apache-Cli) with the date they've been created, and proceded to assign each commit just by confronting the dates.

Using Pydriller tho, the numbers don't match. So, for example, for the tag refs/tags/cli-1.2-RC6, my method retrieves just 3 tags, being:

aa2434d301c49d100f50af544333886a6767ce9d
e07fd870ca76f478ffd17755e57cfc7bb5ff747e
f0fba7bff7de067e12a78169d1371f3773f3f5a7

as the date for the tag is Mar 11 02:28:29 2009 +0000. Whenever I use Pydriller, passing to it the first and last commit of the version (being aa... and f0...), it will analyze 124 commits and only f0... between the above 3 will be in the list.

For what I've undestood, Pydriller follows a "tree structure" or something like that and altough I know i could use it with a list of commit (so I'd be fine with that) I guess my starting tag retrieving/assigning methodology is wrong.

Can you give me any advice on how to perform this task?

  • Consider using `git describe`, which already does all the work for you. – torek Apr 07 '22 at 19:20
  • Doesn't this command just give me the current version of the repo? – Gerardo Festa Apr 07 '22 at 20:59
  • `git describe` takes options, including a commit hash ID (or anything suitable to specify a particular commit to describe). I use `git tag --contains` all the time to find out which released version contains some particular commit, in the Git repository for Git. – torek Apr 07 '22 at 23:47
  • Ok, so for a commit I get a list of tags. Now, I have to get just one, but for what I understand, that commit is included in more than one tag (branch?). Let's say I have to evaluate the evolution of the software with some metrics. Would it be wrong to say that if a commit just comes after another commit (tagged with the version X) it is a commit that works towards tag/version X+1? – Gerardo Festa Apr 08 '22 at 08:56
  • `git describe` will only output one of the tags, e.g., `git describe 30e12b924b57b15e707f1749f2e5af15f1c7fe09` => `v1.9.0-3-g30e12b924b` (in Git itself, that's a commit from 2010). `git describe --contains` of that hash ID produces `v2.1.0-rc0~135^2~1`. So we know that this commit went in after v1.9.0 and was first released as part of `v2.1.0-rc0` and (since Git's releases are well controlled) *officially* released in v2.1.0. – torek Apr 08 '22 at 23:58
  • As for the fact that many branches and tags will contain any given commit: that's quite true. It is difficult to say which is the right one to use. The `git describe` code uses a clever algorithm that tries to determine the "closest suitable" tag "before" or "after" the given commit, which is tricky since individual commits have only parent links and the overall graph produces the equivalent of "cousins" (descendants with different ancestry but a common ancestor if you go far enough back). – torek Apr 09 '22 at 00:00
  • 1
    Only a human can decide *for sure* that some relationship means "this commit should be included" though. What happens when some commit is in some branch, then is reverted, and then the tag is made? – torek Apr 09 '22 at 00:01

0 Answers0