Using pygit2's discover_repository to locate multiple repositories in a directory

Question

I have a project in which I need to access a (local) directory of bare git repositories in order to get specific items from their history. I need a function which will iterate through the directory and do something like:

repo = pygit2.discover_repository("/path/to/repo")

but I need to be able to do this in a for loop.

Each of the repositories is for a different project, the names of which are located and stored in a list through the use of some nested loops elsewhere in my code.

1) Does it make sense to use the project names in place of repo above if I will only be referencing the project names based on their list index throughout my code (or should I instead give each repo a name like repo_n where n is an integer that gets incremented in each iteration of the loop that discovers the repos)?

2) Is it possible to discover these repos in a for loop so that I can get them all in one go, or will I need to do them one by one?

3) If it is possible to do this in a loop, how can I go about creating a tuple (or maybe a dictionary) that contains the project name and the newly discovered repository object?

I would very much like to know why my question was downvoted so that I may improve it. :) — DJGrandpaJ, Mar 30 '16 at 19:26

score 0 · Accepted Answer · edited May 23 '17 at 12:31

Initially I had started with code like the following:

name = 'repo_'
i = 0
repo_list = {}
items = get_items() # defined elsewhere

for item in os.listdir(dirpath):
    i = i + 1 # this was just to add a custom name to the repos located
    repo_name = name + str(i)
    path_to_repo = os.path.join(dirpath, item)

    repo = pygit2.discover_repository(path_to_repo)
    repo_list[item] = repo

but this was returning a list of string objects instead of a list of Repository objects. It turned out that the discover_repository() function returns the path to the repository, not the Repository object. I have to say, I didn't find a discover_repository() function anywhere in the pygit2 documentation, and I hadn't seen anyone use or talk about it until I found this SO question. But now I know (and I think it'll be useful for future readers as well):

pygit2's discover_repository(path) function returns a string representation of the path to the located repository. This is not a Repository object, which still must be instantiated.

So after looking everywhere for an answer, I found this snippet, which included a line I'd missed:

    path_to_repo = os.path.join(dirpath, item)

    repo = pygit2.discover_repository(path_to_repo)
    repo_name = Repository(repo) # this line was missing
    repo_list[item] = repo_name

Closer, but something's off here. Sure, this does what I wanted, but isn't that a little redundant? Later, after working on a different section of my code, I ended up with just this as my whole for loop:

for item in os.listdir(dirpath):
    i = i + 1
    repo_name = name+str(i)
    path_to_repo = os.path.join(dirpath, item)

    repo_name = Repository(path_to_repo)
    repo_list[item] = repo_name

This achieves the desired result. I now have a dictionary returned that looks something like:

{'repo_1': [listOfRepositoryObjects], 'repo_2': [anotherListOfRepositoryObjects]}

So I actually didn't need the pygit2.discover_repository() function at all, because I included something which does the same thing at path_to_repo = os.path.join(dirpath, item). Since they ended up returning the same thing, I'm going with the function I wrote because it seems to fit better with my project's requirements.

Using pygit2's discover_repository to locate multiple repositories in a directory

1 Answers1