0

I want to use version controller for large files(Video files). And I want to modify files and check in many times. If I use git, it will store file content each time when I will check in. Is there any tool available which will store only the difference of Modified file so that I can save storage space.( I don't want to store full file when each time I do check in)

I have gone through git - media. It will store full content each time when I am doing check in. I guess git annex also will work similar to git - media.

Thanks

Shane Wealti
  • 2,252
  • 3
  • 19
  • 33
Royal Pinto
  • 183
  • 2
  • 3
  • 10
  • I'm not sure, but does github have a space limit ? anyway, you could force the management type of the file (manage binary as text and do a diff on them). Maybe you could find another way than putting them into git. It will slow it a lot (because git do sha1 hash in order to check files...) – ykatchou Aug 30 '11 at 13:32
  • Thanks for your reply. May I know how to manage binary as text and do a diff on them – Royal Pinto Aug 30 '11 at 13:35
  • git stores just the diff when you do a `git gc`. – Karl Bielefeldt Aug 30 '11 at 14:57
  • Is there any issue of losing data or any other issue in using this git gc. Why git will not do this by default. – Royal Pinto Sep 06 '11 at 06:18

3 Answers3

2

If you are storing binary files which change frequently ( which seems to be your case), I would recommend SVN over Git. It does store only the delta. My observation with Git has been that it doesn't handle large binary files that change frequently very well. The repo size goes up, and you spend lots of time cloning etc. This despite the packfiles, at which point git stores the delta, and gcing the repo.

Then again, remember that these are primarily designed as source code control software, and though SVN ( and Git) handles binary files, it is not really their use case.

manojlds
  • 290,304
  • 63
  • 469
  • 417
2

This is going to depend on how well the system is able to represent the difference between two video files, which in turn is going to depend on how the video files are stored.

Most version control systems are able to handle binary files; they vary in how well they handle them. Some probably just give up and store each version in its entirety.

Presumably you're using some compressed format (i.e., not every pixel of every frame is stored explicitly). If you have a video X, and you make a small change to it to produce video Y, are X and Y going to have long stretches of identical byte sequences, or is the compression scheme going to scramble everything? If the former, any decent binary diff algorithm should be able to find (and not store) the identical sequences; if not, no such algorithm can do so, unless it's specifically aware of the internals of the video format.

You might actually get better results with a format that doesn't compress the data very aggressively, so it leaves something for the comparison algorithm to work on. [EDIT: This is speculation on my part; I have no actual data to back it up, but it seems like a reasonable guess.]

I know this doesn't actually answer the question, but perhaps it can provide a starting point for you or someone else.

Keith Thompson
  • 254,901
  • 44
  • 429
  • 631
0

You could simply use SVN which can handle binary files out of the box.

khmarbaise
  • 92,914
  • 28
  • 189
  • 235