0

I am doing some data mining research and I need to be able to pull from a git repository all filenames with their associated change log and pipe them to a text file.

I am interested in parsing through the change log of each respective file and finding the bugzilla bug ID associated with it.

So far the command:

git log --stat > gitoutputlog1.txt

gets me close to what I want but there is a lot of information there that I don't want and could potentially confuse my parser.

Anybody have any ideas for a bash script or command that can do what I want specifically and cleanly?

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
Nomad
  • 1
  • 1
  • 1
    you'd have to define what you think of as the "changelog" for anyone to give you concrete advice on this – Nevik Rehnel Apr 16 '13 at 18:19
  • Please format the code (`git log --stat > gitoutputlog1.txt`) in your post by using an indent of four spaces; this will make your question easier to read and you will get an answer more quickly. – Martijn de Milliano Apr 16 '13 at 18:34
  • The command is highlighted (it is not code, if anything it is script). – Nomad Apr 16 '13 at 18:52
  • Changelog is a property associated with each file in a git repository that houses the developer comments for the commit. – Nomad Apr 16 '13 at 18:53

3 Answers3

0

I am trying to achieve what I understand from question. It may not be exactly what you want but i think you can derive your exact answer from it.

To get all the filenames you have to clone the repository.

git clone http://github.com/{user}/{project} {dir_name}

Now, you can write some shell script like this

#!/bin/bash
FILES=/path/to/* #Give path to the directory you have cloned
for file in $FILES
do
  # $file store current file name
  git log $file --oneline >> somefile.txt 
done

I am directly putting output of git log $file --oneline in output file. You need to manipulate it and extract bugzilla id and send to output file properly.

Sachin Jain
  • 21,353
  • 33
  • 103
  • 168
  • Is there a way to 'git grep' on an individual file within your loop? – Nomad Apr 16 '13 at 18:57
  • I apologize as well for any questions that may seem elementary, I am a novice when it comes to git and writing bash files. – Nomad Apr 16 '13 at 19:01
0

So for each commit you want a list of all the changed files and you want a bug number from the commit message.

doit() {
    bugnumber=$(git cat-file -p $1 | your-message-parser-here)
    git diff --name-only $1^! | xargs -n1 -d\\n echo $bugnumber 
}
git rev-list | doit
jthill
  • 55,082
  • 5
  • 77
  • 137
0

git log --name-only returns easy to parse format, each file name is on a separate line and there is no any fancy formatting. You could also look at the --format argument, it accepts a format string with %X placeholders for all bits of information.

kan
  • 28,279
  • 7
  • 71
  • 101