3

First some background info: We are currently in the process of migrating a big git repository from Bitbucket to Azure Devops. There were some challenges because the repository has a history that was full of binary blobs that in hindsight were totally unnecessary.

After previously trying out bfg-repo-cleaner we finally ended up using git filter-repo and successfully trimmed down the repo size from several gigabytes to "just" around 400 megabytes (depending on what you count). We also rewrote some tag names.

Our process was to first make a fresh clone from bitbucket and then run a shell script that shrinks the repo. After that we pushed the repo to a new blank repository that we created in Azure Devops.

This all went way easier than we expected. git filter-repo was blazing fast and the whole process did not take more than an hour.

Before we felt safe doing the movement (and forcing all of our devs to freeze the repo for some time) we did a couple of test runs to make sure we did not loose any data and a Azure Devops pipeline can build our code just as fine as Bamboo used to do.

We successfully made a yml pipleline that roughly took 4 minutes to run in total. Feeling confident that we solved al our problems we proceeded to do the entire process for real. Everything went smooth and we quickly moved all our devs to the new repository.

The problem: Then we noticed that our new pipeline took way longer to build than our previous tests did. After some digging in the logs we found out it had something to do with downloading objects.

New Repo (Checkout takes 8 minutes in total)

remote: Found 39837 objects to send. (1316 ms)

Receiving objects: 100% (39837/39837), 809.66 MiB | 1.69 MiB/s, done.

Test Repo (Checkout takes 31s in total)

remote: Found 11772 objects to send. (358 ms)

Receiving objects: 100% (11772/11772), 80.17 MiB | 8.75 MiB/s, done.

I think it's relevant to mention that we use --depth=1 during the checkout. In our test pipeline this drastically brought down the checkout time.

Now we are at a point that we are happy that everything works and we can say goodbye to a costly VPS hosting both bitbucket and bamboo, but frustrated by longer build times that we are used to.

I suspect that our pack files somehow are not optimized enough so you need to download more of them to "clone" the repo. I say "clone" because the pipeline seems to init a fresh repo, add a remote and fetch. When I do a real clone on my local dev machine it only takes 5 minutes (including transfer over the internet and resolving deltas). I find this very strange.

Any help would be greatly appreciated. Thanks,

Piet Eckhart

Piet
  • 91
  • 1
  • 8
  • Is it slow only for the first pipeline, or in repeated instances as well? – Mansoor Jan 28 '21 at 09:06
  • We use the free tier for Microsoft hosted build agents. As a result they always start with a fresh build and can't reuse data from previous builds. – Piet Jan 28 '21 at 09:08
  • I think the free tier does not allow locally hosted agents, which is what I would have recommended. As you have already attempted to slim the repository, the only other approach would be to refactor the repository and pipelines into smaller more modular pipelines which you can build concurrently and then consume their outputs as an artefact for further builds. – Mansoor Jan 28 '21 at 09:58
  • If you implement the above approach with git submodules, you may only have to build the small components that have changed and not the entire source. – Mansoor Jan 28 '21 at 10:01
  • As we did manage to produce a setup with an acceptable buildtime we want to pursue this option first. One of our goals was to get rid of local hardware / VM's in our CI. – Piet Jan 28 '21 at 11:58
  • We just found out that in our test scenario we did not push tags. Could this in theory result in way more objects? I was thinking about historic tags that are not on the main master branch but on historic branches that no longer exists. – Piet Jan 28 '21 at 11:59
  • Please mark it as an answer which will make it easier for people who have the same question to find answers. – Walter Feb 01 '21 at 02:01

1 Answers1

2

Turns out the problem was that in our previous test we did not push tags from bitbucket to azure devops.

When we did push tags the checkout process in azure devops takes longer because --tags cancels out the effect of depth=1 when you have a lot of tags.

See: https://developercommunity.visualstudio.com/idea/878730/git-source-provider-add-ability-to-pull-git-repos.html

Piet
  • 91
  • 1
  • 8