3

What's a good approach to revision control PGP encrypted text files?

The goal is to

  • only store PGP encrypted (preferably with ASCII armor) text files any where, in local repository (working copy) and remote repository ("central" repository, logically).

  • preserve privacy enabled by PGP encryption (by using GnuPG for example) in repositories where revision history will be stored

  • when possible, reduce storage overhead

If one just revision control the PGP encrypted and ASCII armored text file, as its entire content will change every time when it is decrypted for editing and then encrypted before being stored and committed to revision control repositories, the diff will be roughly proportional to the file size, and will grow fast even if the change in decrypted text is small.

Meng Lu
  • 13,726
  • 12
  • 39
  • 47
  • 1
    This seems like an odd thing to do. Why do you want to store encrypted files in a version controlled repository? Why not control access to the repository itself, or encrypt the repository itself? – Nick Johnson Jun 11 '11 at 22:40
  • 1
    It might help to define the scope of this problem. How many people need access to the files? How many files are there? How big are the files? How often will they change? Do the files need to be revision controlled in synchronization with other nonencrypted files? Are all computers accessing these files under your control? Are all the computers on the same network? – this.josh Jun 12 '11 at 01:28

3 Answers3

5

You seem to be ordering up a square circle. An important goal of encryption is to avoid any correlation between small changes in plaintext and ciphertext. So, if you ask the poor VCS to deal with encrypted files, you can say goodbye to reasonable space consumption or any deltas.

It's not clear to me if you are looking to encrypt all of your files or just a few. If the former, it seems to me that you need to go hunting a VCS that encrypts on the way to and from storage.

If I had this problem, I would be tempted to fork git and experiment with marrying it to gpg.

bmargulies
  • 97,814
  • 39
  • 186
  • 310
  • 2
    +1. The whole point of encryption is to make the output indistinguishable from random data. You cannot compress (or diff) random data. If this were possible, it would constitute a major break of PGP... – Nemo Jun 11 '11 at 21:15
  • "If the former, it seems to me that you need to go hunting a VCS that encrypts on the way to and from storage." If I were writing a VCS I wouldn't implement this as part of VCS, instead I wou,d make sure that the VCS can store data in a store that supports encryption. For example NTFS file system (if it's windows) – Andrew Savinykh Jun 11 '11 at 21:16
  • 1
    @zespri I could see some argument for a VCS that offered end-to-end encryption out to the client, but I could also see an argument for having it depend on some pre-existing strong disk encryption. – bmargulies Jun 11 '11 at 21:19
2

I was thinking that you could perhaps do some encrypted computation, i.e. you can encrypt the data in a way that the computer can do certain computations with it without knowing its value. However, I don't think that would be possible with a diff. Whatever solution you're going to do needs to actually ask you for the password every time it takes a diff, and decrypt the file and re-encrypt the diff.

Hmmm...searching some more, it's looking like what you'd want is http://en.wikipedia.org/wiki/Homomorphic_encryption which is homomorphically preserving the "diff" operation (although your restriction is somewhat relaxed, as your output domain can be different than your input).

Jeremy Salwen
  • 8,061
  • 5
  • 50
  • 73
0

Can you encrypt a PGP file for shared access? I think you can't PGP only allows one person who knows the secret key to access the data. Version control systems are used for shared access so this is problematic.

Another issue is most revision control systems compute and store deltas, and they need to know the plain text representation of the files anyway.

If they do, nothing prevents you secure access to the underlying store. For example store data in the encrypted file system. It won't be PGP encrypted but it will be encrypted.

Anyway, what is the goal of that PGP encryption you are trying to achieve? Maybe there are other ways than PGP to achieve this goal?

I'm not aware of any implementation of revision control system that supported PGP or even asymmetric cryptography in general. I doubt that it exists / practical.

Again this all comes down to question 'what for'. Can you explain what is the end goal you are trying to achieve by using PGP fo revision control system?

Andrew Savinykh
  • 25,351
  • 17
  • 103
  • 158
  • The reason is simply that some of users' plaintext files need to be revision controlled for keeping track of history of content as well as encrypted for privacy. – Meng Lu Jun 13 '11 at 03:13
  • In this case I would go with securing access to the SCM itself. – Andrew Savinykh Jun 13 '11 at 03:25