0

I have a multi-terabyte filesystem I want to diff against a multi-terabyte tar file. The tar file is only available from stdin - no seeking allowed. I do not have the disk space to write the tar on stdin to disk.

GNU "tar --diff" is almost what I want, except:

  1. It does not allow comparing mtime's within a tolerance - some of the mtimes in the tarball are rounded. It'd be better to ignore mtimes than to compare them overprecisely.
  2. It does not report on files in the filesystem that are not in the tarball.
  3. GNU tar does not appear to build with a modern gcc.

Before I code something myself, is something like this already available?

I'd prefer a solution in Python, C, Rust, Java, C++ or Go - in that order.

I googled this for about an hour, and did find several almost-solutions, but nothing precisely like what I'm looking for.

Thanks!

user1084684
  • 155
  • 1
  • 7
  • This sounds interesting, but more like a programming project than a ServerFault question. If this were my project, I'd modify `tar` since it is "almost what [you] want". What is the basis for your 3rd 'except'? GNU tar is so core to Linux, it must buildable with modern tools. Does this help: [How To Build Debian Packages From Source](https://ostechnix.com/how-to-build-debian-packages-from-source/)? – bitinerant Nov 30 '20 at 19:47
  • It was July's GNU tar 'master' branch that failed to build - with alignment problems. GNU tar master builds today. However, GNU tar is kind of big and not that well documented. Also, I'm getting a discrepancy between a python program (backshift) and a C program (GNU tar) in some data - I'm tempted to write a verifier in Rust to avoid bug-sharing. That is, assuming something similar doesn't turn up. – user1084684 Dec 01 '20 at 22:50

0 Answers0