Track files but exclude them from a git bundle

Question

I have a somewhat complex ansible workflow. I have two airgapped networks. I develop playbooks on both networks, so I have two somewhat independent ansible repositories managed by git. At the same time, most of the playbooks can be used in both places. To complicate matters, this is a one way transfer. I can transfer from network A to B, but not from B to A.

I have template files with information relevant to one network but not relevant on the other. I've designed it so that filenames should be the same (as well as variable names in Jinja2 templates). I want to be able to create a git bundle that excludes the files, so that when I pull from the bundle on the other network's repository, the files don't get overwritten. Because including the wrong information in the template files could conceivably break the entire environment, I need to track the Jinja2 template/variable files in Git.

Does anyone have a workflow recommendation, or a git command besides using .gitignore (because the files need to be tracked so I can roll back in emergencies) that will help me accomplish this?

OK so as a follow up, since I can't track the configuration data in this repository structure, is there a way I can use a .gitignore file that excludes a specific directory (or directories) from the main repository, and then some how create a second git repo that exists within the file tree to track the actual configuration data? I'm not holding out much hope as that is a terribly messy and complex situation and everything I've learned about system administration for the last 20 years screams don't do it...but if I had to, could I? — Herald Storm, Sep 10 '17 at 05:29
You could do this with a Git submodule. I dislike submodules, though; I would avoid them if possible. A submodule is, in essence, a separate Git repository where, in the parent ("superproject") repository, you record which specific *commit* one should use from the other repository. This is just a somewhat more automated form of saving a raw hash ID for the submodule in a plain-text file in what Git calls the "superproject". Hence you can transfer (via bundle or whatever) the superproject data and simply never bother with the other Git repository. — torek, Sep 10 '17 at 06:31
I certainly agree with your hesitance to employ something like this. I just want to have some sort of backup plan in case I can't find another reasonable solution that allows me to track changes. I do like the idea of a configuration script. I am thinking about having a separate branch in the repo to track such things, and then I can make sure that the scripts are only in the file system tree as a result of positive action. Good security! Thanks very much for your advice and help! — Herald Storm, Sep 11 '17 at 15:41

score 3 · Accepted Answer · answered Sep 09 '17 at 19:45

There's no completely trivial way to do this.

Fundamentally, a file is tracked in Git if and only if it is in the index. The index is (normally, initially) populated from some commit, so that it is some previous commit that determines if a file is to be tracked. Assume there exist sets of commits T and U that are similar except that there are some files not in commits U that are in commits T. Then:

git checkout any-T-sub-i-commit

results in the file(s) being in the index (and hence tracked), while:

git checkout any-U-sub-j-commit

results in the file(s) being not-in the index (and hence untracked).

The same holds in a more general fashion for operations like merging: when you work with commits from set T, you work with the ones that have the files; when you work with commits from set U, you work with the ones that lack the files. If you merge any T_i commit with any U_j commit, the effect on any such file—whether it's added, removed, or conflicted—depends on whether the merge-base commit is in set T or set U, and the specific changes to those files within commit T_i with respect to the merge base commit.

Of course, as files move into or out of the index, Git also copies them into, or removes them from, the work-tree at the same time (with the usual caution about not removing unsaved-but-precious data). So this means that the work-tree file will vanish and reappear depending on whether you check out a T commit or a U commit.

Meanwhile, let's look at what a bundle is, at least in an abstract sense. The essence of a bundle is that it contains at least all the data that git fetch or git push would send across the wire, after the git fetch or git push communication process that serves to minimize this data. (It can contain extra data, which will simply be ignored.) This minimal data consists of all of the objects that must be copied—annotated tags, commits, trees, and blobs—plus the reference names and their values.

To exclude some set of files from the bundle, then, you need to bundle exclusively the U commits, and not any of the T commits. That's fine as far as it goes: if you have all branches duplicated, and distinguish between T commits and U commits by branch names, you can achieve this pretty easily. But the consequence is that every time you make a new T commit you must make a corresponding U commit, and vice versa. You have, in effect, doubled your workload.

The standard recommendation that applies to configuration files in general applies here as well: Do not commit any configuration, ever. Commit only sample or default or template configurations. Use some kind of wrapper to convert these sample configurations to real configurations. (The wrapper can also be committed, of course, if it's something you write yourself, such as a shell script or Python program or whatever.) You may now maintain, and version-control, these sample / default configurations. Cloning the repository obtains the samples, and updating from the clone—git fetch followed by a merge or rebase—updates the samples, but does not touch the actual configuration. Depending on how smart the wrapper is and what's available in your output format,¹ it can even auto-detect that the sample/default input has changed, and warn or fail any runs that use the prescribed tool (i.e., the wrapper itself) until the real configuration is updated to match any required changes coming from the sample/default/template configuration.

This is still not trivial—in particular, you may have to write a wrapper, and educate users on the correct way to run your particular system. But it's as close to trivial as you are likely to achieve.

¹In this particular case, your output is most likely the YAML files for ansible. This means you can hide all kinds of useful sample/default-config information in comments, for instance.

Wow, I really appreciate you putting this into such a formal analysis! My set theory is a bit rusty, but I do get what you're explaining. Unfortunately, you've also confirmed my suspicions and fears. I'll either have to maintain the actual configuration value data without tracking it in the same repository, or go to somewhat ridiculous efforts to keep it out of the bundle for transfer. — Herald Storm, Sep 10 '17 at 05:25

Track files but exclude them from a git bundle

1 Answers1

Linked