1

When you ask a running GitLab instance to generate a full backup archive with the gitlab-rake gitlab:backup:create command :

  • Does GitLab perform anything to freeze the application state ?
  • Is there any risk to get a technically working backup generated that embodies an inconsistent state?

In detail:

  • What happens when new commits are pushed while the backup is being generated?
  • Generally speaking, if any modification is initiated during the backup what can happen?
  • Is there any cache that queues changes to apply to the database or to write to files/repositories?

At the moment I have no idea what happens when you archive a repository being modified or when a backup is done on a database running transactions?


I read through the backup code of GitLab today gitlab.com/gitlab-org/gitlab-ce/tree/master/lib/backup but could not find any hint to my questions. I do not code with Ruby so that doesn't help me...

GitLab just run the tar command on the files to backup.

In the GitLab documentation docs.gitlab.com/ee/raketasks/backup_restore.html#backup-strategy-option it is stated that:

When data changes while tar is reading it, the error file changed as we read it may occur, and will cause the backup process to fail. To combat this, 8.17 introduces a new backup strategy called copy. The strategy copies data files to a temporary location before calling tar and gzip, avoiding the error.

The STRATEGY=copy argument makes gitlab-rake gitlab:backup:create run a rsync -a command to copy all files before creating the archive with tar.

In my understanding of the documentation it is stated that by using the copy strategy GitLab will never produce a technically corrupted archive and will never fail creating it. I assume this strategy ensures that the archive generated is restorable but what about the consistency state of the data?

Can we make sure the backup archive embodies a consistent/clean snapshot state of the GitLab instance?

I can not find any information in the documentation in this regard.


I do want to backup GitLab with no interruption.

I know I could stop GitLab for a few seconds and snapshot the LVM volume or filesystem instead of using the integrated backup mechanism but I do not want to interrupt GitLab.

You can run a backup of GitLab, interrupting all services but the postgresql one, so no modification can occur while backing up with the integrated mechanism of GitLab, but still you have to black out the service to your users for some time.


Bonus: My questions applies also on snapshotting the LVM volume or filesystem!

1 Answers1

1

There are a lot of questions about taking consistent backup of Gitlab but I haven't found a good answere.

Some of the questions:

I can cite you @SørenLøvborg's answere that seems correct:

The repos themselves are backed up using git bundle, so they should be safe as well. Uploads are simple files and write-once, so there should be no issues there either. The database might not be perfectly in sync with repos and files, but not in a way that should cause data loss. All in all, it looks entirely safe to do a backup while GitLab is running, even if it's not atomic.


Edit: you have already received an official response from Gitlab Team.

fox91
  • 163
  • 7