1

I have a problem that brings me to despair and is plaguing me for the last days and I hope somebody can give me a hint what I have overlooked, since bash/sh is not a field I work in everyday:

Scenario: I have a project developed in OS X 10.11.6, that gets packed into a tar file and a sha 256 checksum is calculated of this tar file.

On the git pre-commit hook I add the calulated checksum in a .sha file to the repository, so that another system that wants to install that project can compare if the files are the same by also packing the file into a tar file and calculation the checksum and comparing it to the .sha checksum from the directory. If those checksums are the same, the version of this package is "verified" and valid for the end user, if not, a warning is displayed.

So the pre-commit hook and the checksum.sh files basically do the same, except the first adds the calculated checksum to the repository.

I use the same tar utility on both systems, (GNU) tar 1.28 on Ubuntu (tried it with 1.30 aswell, no difference) and gtar (gnu-tar) 1.30 on OS X.

Problem: I get different checksums on OS X than on Ubuntu (16.04 in Virtualbox) even though pkgdiff / diffMerge / filemerge (OS X) show no differences in any files and I exclude and normalize a bunch of stuff when building the tar, excluding any git parts, temporary files, post-install directories, weirdly inconsistent npm files (see my other question here: npm install different package-lock) and the .sha/sha.tar files themselves as well as normalizing the modification time and setting the owner:group to root:root.

When I compare an Ubuntu-built tar archive to an OS X built with pkgdiff I see no differences, with FileMerge on OSX there is a bunch of obfuscated(?) and rearranged code, which I suspect could be the problem, since I'm comparing the checksum of those tar archives later but I can't figure what is the source of this difference.

System 1 - OS X: 10.11.6, gtar 1.30, git v 2.10.1 System 2 - Ubuntu 16.04 LTS, tar 1.28 (and tar 1.30), git 2.74

I would be very happy if somebody has some expertise on this matter and would help a fellow developer to solve this issue, but I am grateful for any input - thanks in advance!

My checksum.sh looks basically like this:

unameOut="$(uname -s)"

case "${unameOut}" in
    Linux*)     tar --mtime='2017-01-01' --exclude='.sha' --exclude='*.git' --exclude='.DS_Store' --exclude='node_modules' --exclude='package-lock.json' --exclude='workstation.json' --exclude="npm-debug.log" --exclude-vcs --exclude=".gitignore" --exclude="sha.tar" --owner=0 --group=0 -cf ./sha.tar ./ 2>/dev/null;
    sha256sum ./sha.tar | cut -d " " -f 1 > .sha_temp_check;;
    Darwin*)    command -v gtar >/dev/null 2>&1 || { echo >&2 "On MacOS gnu compatible TAR is needed, please install gtar via homebrew \n -> brew install gnu-tar ('xcode-select --install' maybe also needed)!\n…Aborting."; exit 1; };
    gtar --mtime='2017-01-01' --exclude='.sha' --exclude='*.git' --exclude='.DS_Store' --exclude='node_modules' --exclude='package-lock.json' --exclude='workstation.json' --exclude="npm-debug.log" --exclude-vcs --exclude=".gitignore" --exclude="sha.tar" --owner=0 --group=0 -cf ./sha.tar ./ 2>/dev/null;
    shasum -a 256 ./sha.tar | cut -d " " -f 1 > .sha_temp_check;;
#    CYGWIN*)    machine=Cygwin;;
#    MINGW*)     machine=MinGw;;
    *)          echo >&2 "Incompatible OS: ${unameOut} \n…Aborting."; exit 1;;
esac

rm sha.tar

stored_sha=$(cat .sha)
checked_sha=$(cat .sha_temp_check)

echo "STORED checksum: $stored_sha"
echo "CALC'D checksum: $checked_sha"

if [ "$checked_sha" = "$stored_sha" ]
then
    echo >&1 "Version verified. Continuing. "
    rm .sha_temp_check
    exit 0
else
    echo >&2 "Keys didn't match. UNVERIFIED VERSION! \n Stored SHA: $stored_sha \n Checked SHA: $checked_sha"
    rm .sha_temp_check
    exit 1
fi
hreimer
  • 181
  • 1
  • 2
  • 10

2 Answers2

2

I just made a test on my Linux Debian and Mac OS and results are exactly the same.

Maybe the shasum command is not the reason and it's simply because your ./sha.tar files are not the same. Did you try to compare the 2 sha.tar files using the diff command?

  • yes I investigated into that direction (diff'ing the sha.tar files) and it turned out, there was more difference between the two .tar files than I expected - will explain this in a separate answer shortly after my tests were successful – hreimer Jan 16 '18 at 14:30
0

Finally the solution was found after going into the direction of comparing the created tar files from Ubuntu and OS X and eliminating the differences:

Partly because shell / linux is not my normal field of work I overlooked some parameters/options usable for cross-platform tar archive creation which are as follows:

  • Ownership: I used

    --owner=root --group=root
    instead of
    --owner=0 --group=0
    to normalise the ownership of the input files. However, since there is a 'root' group in Ubuntu but there was none on my OS X the value "0" assumes the default user/group while "root" or any other explicit declaration first has to be mapped on the system to assign the correct user/group. Apparently, this didn't work for group since on OS X I always got the default "staff" group id in the tar header.
  • Permissions: I didn't know the file permissions also had to be normalised, therefore the

    --mode="600"
    option just sets all files to be packed into the archive to the same value (doesn't matter which one because I use the tar archive only to calculate a checksum, not to distribute files).
  • Other flags: As a precaution I included the

    --portability
    and
    --dereference
    flags - For the first one see https://www.math.utah.edu/docs/info/tar_8.html:

    When you specify it, tar leaves out information about directories, pipes, fifos, contiguous files, and device files, and specifies file ownership by group and user IDs instead of group and user names.

    and for dereference on the same page:

    causes tar to archive the files symbolic links point to, instead of the links themselves

  • Find & Sort: I noticed by comparing the tar archives, the order of the files varied a lot. Turns out when listing the contents of the originating folder, Ubuntu uses a different "file sorting order" that can be standardized by exporting the "LC_COLLATE=C" variable in the .bashrc file to use another sorting style (it's not about numbers / date / name but about the different order of capital letters and hidden files/directories. That meant, the tar tool on Ubuntu probably also had a different order of files when creating the archive since I originally defined all of the folder contents as input. However, the solution was to normalise the order of the input files for the archive, therefore the "-T" option is useful, it accepts a list of files to be archived. Combining all that, first the files in the current directory are found, some paths excluded (continually changing git hashes), the result piped to the sort tool by explicitly setting the LC_COLLATE=C variable and the piping the result to the tar archiver, so that with the "-T -" option only the pre-sorted / pre-filtered files will be archived.

After all this was done, the final working command to create a cross-plattform tar archive with the same sha256 checksum on either OS X and Ubuntu is (replace "gtar" with "tar" on Ubuntu since gtar is the gnu-tar version of tar installed by Homebrew on OS X):

find . -type f -not -path "./.git/*" -not -path "./node_modules/*" | LC_COLLATE=C sort | gtar --mtime='2017-01-01' --exclude='.sha' --exclude='*.git' --exclude='.DS_Store' --exclude='node_modules' --exclude='package-lock.json' --exclude='workstation.json' --exclude="npm-debug.log" --exclude-vcs --exclude=".gitignore" --exclude="sha.tar" --portability --mode="600" --owner=0 --group=0 --dereference -T - -cf ./sha.tar

(A useful link for analysing the tar header: tar header format

hreimer
  • 181
  • 1
  • 2
  • 10