Backup docker volumes - is simple tar-archiving not sufficient?

Question

I am running several Docker containers on three machines, composing a Swarm cluster.

Some containers that stores persistent data(like DB, Redis, etc) use data volumes. (I tried to avoid using bind-mount as far as I can)

Such data volumes are located in /var/lib/docker/volumes/, and every volumes are assigned customized name rather than random-sequence-ID:

# ls /var/lib/docker/volumes/
redis-data   postgres-data   fluentd-data ...

I want to backup these volumes periodically, daily for example, so that I could restore when a machine failure occurs and fixed later.

However, every document I found in google illustrated the way to use new Linux container and tar:

https://docs.docker.com/storage/volumes/#backup-restore-or-migrate-data-volumes

$ docker run --rm --volumes-from dbstore -v $(pwd):/backup ubuntu tar cvf /backup/backup.tar /dbdata

Why? Is there any problem if I simply archive /var/lib/docker/volumes/VOLUME directory and copy it to other machine? For example, permission, uid, gid, etc?

$ tar -zcvf redis.tgz /var/lib/docker/volumes/redis-data

P.S.

There would be a case that the backup using tar could cause data inconsistency due to changes in data during archiving. For example, archiving DB data directory when DB is still running and inserts or updates are performed... But I think this problem is applied to both approaches in same way.

score 4 · Accepted Answer · answered Mar 12 '19 at 12:57

A named volume can store data outside of /var/lib/docker. E.g. you can create a named bind mount with:

  $ docker volume create --driver local \
      --opt type=none \
      --opt device=/home/user/test \
      --opt o=bind \
      test_vol

or here's one for an NFS mount:

  $ docker volume create --driver local \
      --opt type=nfs \
      --opt o=nfsvers=4,addr=nfs.example.com,rw \
      --opt device=:/path/to/dir \
      foo

In these scenarios, the tar backup accesses the data the same way your container does, and therefore performs a backup regardless of how the named volume was created. It also effectively exports the data to a common format that can be used not only by other containers, but anywhere you happen to move your application.

If you find yourself needing more control over the volume contents, for more direct backups, then the named bind mount is a mid-way point between named volumes and host mounts. You get to treat the directory as a named volume to the container, but the contained data as just another directory on the host to backup.

Personally, I tend to treat /var/lib/docker as a black box. While the contents are very readable, docker is free to migrate and change things in there between versions, while the API used by users should remain more consistent. The fewer things I need to change should they transition to something like the containerd image management, the better.

score 3 · Answer 2 · edited Oct 27 '18 at 23:54

3

In fact this is a pattern: the Data only container.

The idea is to have some docker images only dedicated to storage and other ones only to applications. Taking care where your data is physically stored is a pitfall.

You have to just know that your data is stored correctly in a Dockerized infrastructure. Not where. And use Docker to create a dump of your data. Not cp nor tar commands directly.

EDIT

The data only container was a useful pattern when Docker volumes weren't fully OK. But the idea remains the same (in this kind of infrastructure, you should not take care where data are stored).

see Docker Volumes starting with:

Volumes are the preferred mechanism for persisting data ...

edited Oct 27 '18 at 23:54

Sergey Vyacheslavovich Brunov

17,291
7
48
81

answered Jul 05 '18 at 13:51

Max

511
8
18

I think i get what you're saying, but i'm no completely sure. Could you please elaborate or reference some best practices on the web? – janechii Jul 07 '18 at 07:34
Thanks for the answer. It may be the pros of using another docker container for backing up, in somewhat "logical" manner or "user-transparency". However, I could not understand, from a practical point of view, the difference between taking care where my data is stored on my host machine and taking care what the name of the container and the mountpoint in the container are... Anyway, with regard to the result, there seems to be no difference. Am I right? (I'm sorry I am not good at English and I'm afraid that I can't tell what I think exactly) – gypark Jul 09 '18 at 01:53

score 1 · Answer 3 · answered Mar 12 '19 at 12:17

No problem as long as you are aware of consequences and willing to take risk by depending on internals of the system. But why do you want to take that risk when there is a documented approach to achieve the same operation with not much complexity?

If I were you I would use the documented approach to escape maintenance cycle as product evolves.

If Docker decides to change mount points location or provide it as configurable option then your non documented approach to backup data will fail.

Backup docker volumes - is simple tar-archiving not sufficient?

3 Answers3