what to save in git (or any other revision control system)?

Question

I'll take Drupal as the exmple for my first question. We have a drupal instance with a custom theme and many downloaded modules. We have a server used as the source code repo. The current workflow of work on the dev instance of drupal then migrate the changes over to test. Once clients test it out and like the changes move them to production. - Should I place the whole drupal dir in git? or theme+plugins? or just the theme? One advantage of placing the whole instance is i can easily clone the changes on prod and it would look identical to dev and test.

Second question is when we do local customization to the product's core code (let's take DSpace as an example). We get version 1.6 and do local changes and keep them in our repo. Then when they release the version 1.7 I have to get that and merge my local changes to that. How do I get 1.7? Can i get in to a branch?

What the best practice in these situations.

Thanks.

PS - I know about git ignore and how to do it technically. What I'm looking for is advice on best practices.

score 2 · Accepted Answer · edited Jun 20 '20 at 09:12

2

In order of importance:

Put everything into Git which is necessary to build the project (i.e. including all dependencies which the build process doesn't download by itself)
Put everything into Git to run the project on a developer machine
All the generated files (often useful for finding sudden bugs - a.k.a "But I didn't change anything!"). These are nice to have but can be a nuisance since they change often.
All the dependencies (so you can easily recreate an old build)

I usually don't put the production version into Git; instead I have a script which creates a production site out of the project plus an upload script to deploy this site. That way, I can give the new production version a local run. And the upload script has a step "backup old files" so I can restore the old production site in a few minutes (well, unless some bug has corrupted the database).

[EDIT] A lot of people disagree with points #3 and #4.

First of all, (as I already said above), this is an ordered list. So the first two points are much more important than the rest.

That said, it still makes a lot of sense to version generated files. Common cases are IDE project settings and output of code generators.

Why version them when it's so simple to recreate them? There are several reasons:

Hard disk space is cheap. If this can help you find a bug in one hour less, that equals to about $100 which is about 2TB of disk space.
IDE project files contain compiler settings and other very important configuration. If you don't always have them under version control, you will eventually have this situation: The files will be in the ignore list of your DVCS. You find a problem (like an important warning which should be active). You activate the warning. You forget to add the changed file to version control.

Or: You're part of a team. After one month, everyone in the team uses different options to build their projects. Builds break because something is configured as a warning in your IDE but it's an error for someone else.
Code generated by some tool surely shouldn't be versioned? Isn't it enough to version the config files of the code generator?

Maybe. The question is again: Will you ever have to find a bug in this code? And even if not: When the files are under version control, it will be very easy to see what the change of a config option means. You just change the option, run the code generator and let your version tool show you what has changed. Sure, you could do something similar manually but why waste so much effort for something that can be had for free?
Most importantly, it will cause "spurious merge conflicts". Many people think this is a problem but it's not. In fact, you will want to see these problems because it means that your build is unstable:
1. Your VCS says a file has changed when it shouldn't; that's not a bug, it's a boon. Without version control, this change would have happened anyway but you wouldn't have noticed it. When was the last time ignorance was better than knowledge?
2. You have merge conflicts in one file over and over again. This is an indicator that your build or project setup needs some love. Again: Solving merge conflicts is a pain. But it is really better that every developer gets their own version of the file without noticing?

edited Jun 20 '20 at 09:12

Community

1
1

answered Aug 17 '12 at 15:11

Aaron Digulla

321,842
108
597
820

I disagree with #3. Generated resources should not be versioned unless you have a special use case. See http://stackoverflow.com/a/10855342/1301972. – Todd A. Jacobs Aug 17 '12 at 15:17
I know. Often, it's bad to do it. But when using code generators, having the generated source on version control is a very good way to locate and fix bugs in the generator. – Aaron Digulla Aug 17 '12 at 15:19
Thanks for the reply. So what you are saying is when working with open source products like Drupal or Wordpress place the whole system in Git rather than just the theme? – Dhanushka Aug 17 '12 at 15:23
Imagine your computer dies (hard disk crash, theft, virus destroys all your data, hacker changes production files). What would be the fastest way to make you productive again? – Aaron Digulla Aug 17 '12 at 16:14
I strongly disagree with 3 and 4. There is absolutely no need to include generated files since they will be generated by the build, and checking them in will cause spurious merge conflicts. And if these files are binaries you can not even diff them to see the differences. VCSes are also not well equipped to handle binary files, they are usually badly delta compressable and will be a major contributor to repository size growth. – Laurens Holst Aug 20 '12 at 12:13
For a lot of the same reasons, dependencies should also not be checked in. Instead you should use a dependency system like Maven or Ivy (Ant) or NPM (NodeJS) to download them during the build. Although in environments without such a system, you could consider to check them in. – Laurens Holst Aug 20 '12 at 12:13
Also I personally prefer not to check in IDE project files if they are easy to recreate with some simple setup instructions. This to prevent a proliferation of various IDE project files that may or may not get outdated quickly. An example of this would be to explain in the README how to generate an Eclipse project from a Maven POM file (using the `mvn eclipse:eclipse` command). – Laurens Holst Aug 20 '12 at 12:14
@LaurensHolst: In my case, the IDE aren't easy to recreate since they contain project specific modifications (i.e. I'm rarely using the defaults). This is basically the key factor: If the files just contain defaults and there is **no manual effort** required to create them, they shouldn't be checked in. But as soon as generated files (like IDE settings) have been changed by a human, they should be checked -> Always check them in because otherwise, it's too easy to miss a change. – Aaron Digulla Aug 20 '12 at 12:19
@LaurensHolst: The deciding factor for generated files is whether the code generator is stable. If you're still developing the generator, you will want the generated files versioned so bugs can be tracked (= in which version of the generator they were introduced/fixed). It's also very helpful when you make modifications -> the versioning tool shows a diff what the new settings/feature would change. Conclusion: While your reasons seem "obviously correct", they will cause all kinds of subtle problems. – Aaron Digulla Aug 20 '12 at 12:21
@AaronDigulla: But there’s the problem, IDE settings are often personal. If you have personalized your IDE you would not want to commit this for everyone, but if you don’t the file will stay modified. And when other people do check in their settings it will cause conflicts. However if those settings files contain (only) build-related information (e.g. I believe XCode project files contain information on which files are used to compile a binary), then it makes sense to check them in. – Laurens Holst Aug 20 '12 at 13:41
@AaronDigulla: Wrt. the generated code, I think if you (ab)use version control for that kind of thing, you should do it in a separate repository. Possibly a subrepository (or submodule, in git terminology). Or you could reformulate that generated code as tests which check the expected output. – Laurens Holst Aug 20 '12 at 13:45
What "IDE settings" are you talking about? IntelliJ used to save the open files and other personal information in the project which means these files can't be shared in the team. It also saved compiler options in the same file so the file had to be shared. Which do you prefer? Knowing about these IDE bugs or getting spurious errors because of ignorance? :-) – Aaron Digulla Aug 20 '12 at 13:48
Submodules or subrepos cause other problems while not really solving any. It all boils down to: Is your build stable or random? When the build is stable (= from a certain state, you always get the same output), there is no harm in versioning generated files. The problem is that most people love to ignore random builds (you know you have one when you hear "but it works on my PC!") and instead of fixing this, they prefer to blame the tools. – Aaron Digulla Aug 20 '12 at 13:50
1

#3 and #4 are just flat out wrong. Actually, even #1 is wrong because you should leave installing dependencies to the developer's package manager and just document the dependencies. – alternative Aug 20 '12 at 20:33
@alternative: I would prefer that you try to understand an answer before you downvote it. – Aaron Digulla Aug 21 '12 at 07:01

what to save in git (or any other revision control system)?

1 Answers1