How to clone, then sync/update/push a fork with the upstream master

Question

I think I've read through some of the tutorials and I'm stuck on something entirely basic (I hardly ever use the commandline git, so please be patient ;)).

All I want to do is update my fork (https://github.com/abelbraaksma/visualfsharp) to the latest version of Master (https://github.com/Microsoft/visualfsharp) from the upstream repo. Since I have local changes that I do not care about, I decided to create a new clone (previously I used GUI tools, but they are so confusing and limiting that I gave up on that and dived into the woods of git commands ;).

I did:

cd /D/Projects/OpenSource/VisualFSharp2
git init
git clone https://github.com/abelbraaksma/visualfsharp
git fetch https://github.com/Microsoft/visualfsharp
git remote add upstream https://github.com/Microsoft/visualfsharp
git remote add origin https://github.com/abelbraaksma/visualfsharp
git fetch upstream
git checkout master
git merge upstream/master

That last two commands give:

git checkout master
Already on 'master'
Your branch is up to date with 'upstream/master'.

git merge upstream/master
Already up to date.

I realize I did some things in the wrong order and since I come from SVN and Mercurial worlds, I am often confused by terminology.

I understand that currently I am in "master" of the upstream repo. But I need to merge from the upstream repo into the origin (my fork) repo. I assume I need to update the local copy to whatever the head is of my fork (but git checkout master doesn't do that).

I basically tried to follow this guide on syncing, combined with configuring remote points.

Where am I confused or better, what commands did I get backward?

Doing git remote -v gives me:

origin  https://github.com/abelbraaksma/visualfsharp (fetch)  
origin  https://github.com/abelbraaksma/visualfsharp (push)  
upstream        https://github.com/Microsoft/visualfsharp (fetch)  
upstream        https://github.com/Microsoft/visualfsharp (push)

score 7 · Accepted Answer · answered Jun 29 '18 at 07:47

TL;DR

You're OK, but you have an extra repository you probably should just delete. You should generally start by cloning (with git clone) the repository that you want to have your Git call origin, and then git remote add upstream <the other url> and work from there.

Read the long description below to see what you have now and how to work with it.

Long: what you did, in detail

git init

This creates a new, empty Git repository in the current directory. (If there is already a Git repository here—if git rev-parse --git-dir would print some directory name, rather than failing and saying "I find no repository"—it basically does nothing at all, making it safe to run. There are some corner cases here but you are unlikely to run into them.) Since you intend to clone a repository, you don't really want to this, because git clone also does a git init, as we will see in a moment.

Before we go on to the git clone below, let's take a moment to make a note about a weird state for a new, empty repository. You are probably familiar by now with the idea that a branch name like master actually just holds the hash ID of one (1) commit. Git uses the name to find the last commit on the branch, which Git calls the tip commit. Git then uses the tip commit to find the previous or parent commit, and uses the parent's parent to work back through history. By following the chain of parents, Git finds all the commits that are reachable from the branch name.

But an empty repository has no commits. There's no tip of master for the name master to point-to—no latest commit in master whose hash ID can be stored under the name master. Git's solution is to not have a master branch yet. At the same time, Git declares that you are "on branch master", as git status would say—so you're on a branch that doesn't exist yet.

This weirdness factors in later. For now, let's move on to git clone, and look at what it does. In this case it makes another, separate repository, which you subsequently don't use at all.

git clone https://github.com/abelbraaksma/visualfsharp

This is mostly equivalent to the series of commands:

mkdir visualfsharp: create a new sub-directory within the current directory (current being /D/Projects/OpenSource/VisualFSharp2)
cd visualfsharp: enter the new sub-directory
git remote add origin https://github.com/abelbraaksma/visualfsharp: add the remote named origin (this configures a few settings for it as well)
git fetch origin: obtain all their commits
git checkout somebranch, where somebranch is usually master: creates a local branch name from one of the origin/* names and makes that the current branch.

When these are done you are back in your original directory (i.e., still /D/Projects/OpenSource/VisualFSharp2). Note that your original directory is one Git repository, and its visualfsharp subdirectory is another.

We're going to see you do most of these commands again now, but this time, applied to your currently-empty repository, which is in that weird state where you're on master but master does not exist.

git fetch https://github.com/Microsoft/visualfsharp

This calls up the Git at https://github.com/Microsoft/visualfsharp and obtains commits and other objects from them, into your previously empty repository (not the clone you just made!). It's like git fetch remote except that there are no remote-tracking names—no origin/* or upstream/*—since there's no remote to use to construct such names. This particular form of git fetch dates back to ancient times (2005), before the invention of git remote, and one probably should never use it. It's not harmful, it's just not helpful either here.

git remote add upstream https://github.com/Microsoft/visualfsharp
git remote add origin https://github.com/abelbraaksma/visualfsharp

These are fine: they set up two remotes. A remote is just a short name that:

saves a URL, and
provides the leading part of the remote-tracking names, upstream/* and origin/* respectively.

git fetch upstream

This is almost a repeat of your earlier git fetch. This time, though, your Git uses the name you assigned—upstream—to get the URL. So your Git calls up the Git at https://github.com/Microsoft/visualfsharp again. Your Git gets from them any new commits (and any other necessary Git objects to go with those commits) since the last fetch—probably none, depending on just how long you went between the first fetch and this second one. If you had not run the earlier git fetch, this would get every Git object while getting all the commits.

But now, having gotten the commits, there's a critical difference: your Git takes all of their branch names and renames them to your remote-tracking names spelled upstream/whatever. It can do this now because now you're using a remote, not just a raw URL. The remote—the literal string upstream—gets you this renaming.¹ So your Git and their Git very quickly transfer all of the new objects (probably none), and then your Git sets up your upstream/master and so on, based on their master and so on.

git checkout master

This is where the repository's weird state comes in. Your Git will say:

Branch master set up to track remote branch master from upstream.
Already on 'master'

What happened is that git checkout looked for master and did not find it (because you have no branches), so it created one. First it looked through all your remote-tracking names, upstream/* in this case. It found one that matched: master vs upstream/master. So it created your master, pointing to the same commit as your upstream/master. It then also set up your master to have upstream/master as its upstream setting.

After doing all that—creating your master—git checkout tried to put you onto your master, and found that you were only on your master and printed that confusing "already on" message. Still, it attached your HEAD properly in the process, checking out all the files, i.e., copying them to the index and to the work-tree.

You may or may not want your master set up this way—you are more likely to want your master to start out pointing to the same commit as your origin/master, once you create an origin/master, and to have origin/master set as its upstream. For more about what an upstream is—i.e., what it means to have one branch set to track² another branch—see, e.g., my answer to How to setup multiple tracking levels of branchs with git.

Your last command here was:

git merge upstream/master

Your own master was just created from your upstream/master, so there is nothing to merge: the two names both point to the same commit hash ID.

You have yet to fetch anything from your origin. You probably should do that now:

git fetch origin

Once you do, you will have origin/master as well as upstream/master.³ If you wish, as I suspect, to have your own master track origin/master rather than upstream/master (and to start from there), you should:

make sure there is nothing to commit (there should not be given the above sequence, but it's always wise to check before using git reset --hard);
run git reset --hard origin/master to make your master point to the same commit as origin/master; and
run git branch --set-upstream-to=origin/master master to change the upstream setting.

Now you can run git merge upstream/master. If the upstream has new commits since your own fork occurred, that will merge those commits, using either a full merge if required, or a fast-forward not-really-a-merge operation if possible.

In any case you probably want to delete the extra repository.

¹The underlying mechanism by which Git achieves this renaming is horribly complicated, probably for historic reasons, but in normal practice it's just "change their master to your remote/master" and so on.

²Note that Git uses yet more confusing terminology here: if a a branch name tracks a remote-tracking name (which is a local name your Git creates based on a name found in another Git whose URL is found via a remote) then we call that the branch's (or branch name's) upstream. This is all completely different from tracked vs untracked files. Yikes!

³I am assuming here that the Git repository at https://github.com/abelbraaksma/visualfsharp is your own, and that you created it using GitHub's "fork a repository" clicky buttons in their Web GUI interface. When you did that, GitHub did a somewhat complicated git clone on GitHub itself, creating your repository there from whatever source repository you chose. That means that your GitHub repository has all the same branches as the original source repository.

(The clone GitHub makes does not rename branches. It also has special GitHub-only features set up to allow the GitHub-provided pull request facility; this is not part of Git. The GitHub folks also arrange to share underlying on-disk objects behind the scenes and have all kinds of other tricks to make this a lot faster than it would be if it were done naively. So it's a regular clone in principle, but they have tweaked it to make it more useful via their web interface. That's how they get you to use GitHub instead of just doing it all yourself. )

Thanks for this awesome answer! More than I could've ever wished for, altogether a nice tutorial from my "how not to do it" to your "how to do it and why". — Abel, Jun 29 '18 at 13:50
_"Git uses yet more confusing terminology"_ >> amen to that! _"git fetch dates back to ancient times (2005) ... and one probably should never use it."_ >> yes, I thought it was synonymous to using the alias (remote), which now I understand is not quite the same, using the alias does add extra actions under the hood. — Abel, Jun 29 '18 at 13:51
Great answer! But why do I have to setup upstream every time i clone my fork? Shouldn't git know already of which repo I made this fork? — Sharak, Nov 15 '21 at 12:13
@Sharak: no, Git has no way to know this. Git*Hub* know it, but they don't tell Git. `git clone`'s model has no place to put that information. — torek, Nov 15 '21 at 16:27

Utkarsh Detha · Answer 2 · 2021-04-28T13:19:20.053

I do something very similar to what you did, here's how I do it:

Get the fork's url.
Switch to terminal. cd to the directory where we want to clone.
git clone fork-url-here will clone our fork and set it as remote/origin.
cd fork-name/ changing to the cloned directory.
git remote add upstream upstream-url-here will set the upstream as remote/upstream.
git fetch upstream fetches all branches from upstream.
git checkout master since we are already on origin/master we get a message informing us of the same. So, all is good and this is not indicative of an issue.
git branch -a lists all local + remote/origin/* + remote/upstream/* branches. One of these is going to be upstream/master (initially I used git branch which only shows the local branches, this confused me a bit as I could not see upstream/master in the list).
git merge upstream/master this will merge the upstream/master branch into your current branch, which is origin/master, thereby syncing to upstream.

The issue you face arose because just before you add the upstream as a remote, you fetch from it (fourth line in your code block). That will prevent you from fetching all the branches from upstream. Other things look fine to me.

P.S.: I can see that this is an old issue, but I just thought I might help git beginners (like me), who might be in a rush and cannot read the very nice and informative answer given by torek.

Edit/Extension 1: Another thing is to force the fork's (origin) master to be at the same level as the original repo's (upstream) master.

!!!CAUTION: this WILL discard any and all commits made to your origin master!!!

If you are ABSOLUTELY SURE that you want to go through with it, then in the steps stated above, replace step 9 with the following:

git reset --hard upstream/master will replace the content of origin/master (locally of course) by the content of upstream/master.
git push origin master --force will forcefully push the changes you made to the remote origin.

I suggested this change because I recently had to do this myself, and found that this could help someone (only if they know what they are doing). But, since it also has the potential to destroy valuable work, I have repeatedly emphasized on the risk involved.

Thanks for the instructions. I've meanwhile gotten a little better in understanding the terminology and git commands, but this indeed may still help some others :) — Abel, Mar 01 '20 at 03:14

How to clone, then sync/update/push a fork with the upstream master

2 Answers2

TL;DR

Long: what you did, in detail

!!!CAUTION: this WILL discard any and all commits made to your origin master!!!