12

I'm afraid my question will be a down-voted or marked duplicate. But I still want to clear my basics of git. There are articles on the internet, but since I'm only a few days old in git, I'm confused here:

enter image description here

Clearly, Repository and Workspace are two different things. So I googled the difference between the two. Here is the answer: A local repository is a directory within your workspace. I did not understand this in the light of git. Here is a similar question on StackOverflow. But I did not understand the answer very well. Then I tried studying this: enter image description here.

Please explain to me this as simple as possible (*assume you're explaining this to a 10-year-old kid). I may not understand heavy git jargon.

funnydman
  • 9,083
  • 4
  • 40
  • 55
Tanzeel
  • 4,174
  • 13
  • 57
  • 110
  • 1
    Think of the red Workspace box in the first diagram as a moving box. Originally, it is on top of the Repository box when you do a `git clone` because there is no difference between the two. Think of the Repository as a Safe/Lock Box where you keep valuables. It only gets updated when you commit changes to it. Until then, all changes you make locally are in the Workspace and the Workbase box moves to the right. – Frak Dec 22 '19 at 16:45

5 Answers5

32

The repository is essentially the .git hidden folder inside the working directory (workspace).

The working directory (workspace) is essentially your project folder. 

Keep in mind these are basically synonyms: workspace = working directory = project folder

Let me show the difference with an example.

Step 1: Creating a working directory/workspace

Assume you create a project folder locally on your computer. I'm going to call mine amazing-project.

Then what you have is this, a project folder called 'amazing-project'.

enter image description here

At this point amazing-project is NOT a repository.  Right now it is just a working directory (workspace). It is just a project folder.

Step 2: Creating a repository

Now assume you want to make amazing-project into a repository so that you have a local repository. 

In order to do this you will need to use the 'git init' command.   This command will create a hidden folder inside your amazing-project folder called '.git'. This is essentially your repository.

Now what you have is this:

enter image description here

Step 3: The working directory and repository in depth

Now let's look at the workspace and repository in more depth.   We can distinguish between the two.

enter image description here

The amazing-project folder like we said represents our working directory:

enter image description here

And the .git folder represents our repository:

enter image description here

And actually within our repository there are in a way two important place to keep in mind. The staging area and the commit history.

enter image description here

Step 4: Adding a file 

Now let's add the first file, I'll call it example.txt.

So using our diagram this is:

enter image description here

So, the file is in our working directory (workspace).   But this file is NOT part of our repository. This file is NOT yet being version controlled.

Step 5: Adding the file to our repository 

In order for this file to come part of our repository and to start being version controlled we need to let Git know about it by adding it to the staging area using the 'git add' command.

So then what we have is:

enter image description here

The 'git add' command copies the file from the working directory to the staging area. 

WARNING: A common misconception is that the file is moved, that is not the case. The file is copied.   Now the file is being tracked by Git. And if you were to commit now it would be included in your commit.

Hopefully that explained a bit off the difference between the repository and the workspace! This video and this video were really helpful in explaining this as well as this amazing article which gives a couple more details!

Anna Skoulikari
  • 1,281
  • 10
  • 9
9

They are talking about repository and working tree. So, on a normal local directory that you are tracking with git you have the working tree (the workspace they are talking about) with all the files that make up your project.... and you have a special directory in there called .git. That's the repo with all the git stuff that makes it tick.... and the index is one of the files that is part of the repo and it (the index, aka staging area) is one of the basic features of git.

eftshift0
  • 26,375
  • 3
  • 36
  • 60
  • So when we do `git add .` . Changes will go into staging/index area. Correct me if I'm wrong. Then from there we do `git push` – Tanzeel Dec 22 '19 at 04:28
  • 1
    When you run `git add` you are adding stuff into **the index**. At that point you **haven't** created a revision. So, if you want to push stuff into another repo, you have to run `git commit -m "set the appropriate comment here"` and then you might push that.... just in case: **pushing** is not required for you to create revisions.... git is not like svn in the sense that stuff has to be committed on a central server. You can commit stuff to your local repo and you can (someday) decide to push what you have committed to another repo. – eftshift0 Dec 22 '19 at 04:32
  • One lsat thing Sir. Is `git pull` different from a `pull request.` ? How? – Tanzeel Dec 22 '19 at 04:36
  • 1
    IMO a _pull request_ is not a **real** git concept. It's the operation of asking someone else to merge stuff that you think is ready on another branch.... that is what you normally do when you set a branch on github (or other sites) on a project and then you ask other people to merge it.... or it can be as simple as asking another developer to merge stuff by sending patches... you are asking someone to "pull" code... that's why it's a _pull request_. Now `git pull` is a git operation that basically involves two steps: 1 - fetching from another repo 2 - Merging changes from a branch into yours. – eftshift0 Dec 22 '19 at 04:39
  • Which branch? The branch that the branch you are working on is set to track... and you can also use `git pull -r` (or set something on git config) so that instead of merges, git rebases when pulling. – eftshift0 Dec 22 '19 at 04:40
  • Oh. I thought I'm pulling the code. I thought pulling from my perspective. But now i know its the other way round. Thnks :-) – Tanzeel Dec 22 '19 at 04:48
  • 1
    Well... when you are "pulling" it's an operation from "your point of view".... now, when you create a "pull request", you are asking another person to "pull" from _their_ perspective. – eftshift0 Dec 22 '19 at 04:49
2

There are different ways to categorize the parts of a repository.

In particular, Git offers both bare repositories and repositories without any adjective in front. We could call the latter non-bare repositories, or repositories that include a work-tree, or something along these lines, but Git in general just uses the word repository.

In the case of a non-bare repository, your working tree—which I like to call work-tree, hyphenated, as one word, for short—contains a .git directory.1 That is, you might do:

cd proj

to enter the work-tree for your project (which you've called proj), and in proj there is a .git directory.

In the case of a bare repository, your repository would typically be named proj.git. This directory would contain exactly the same set of files that proj/.git contains in the non-bare case.

In both cases, the files inside the .git directory—whether that's proj/.git, or proj.git—make up the repository proper. The files in your work-tree, if there is a work-tree at all, are there for you to work with.


1In some special cases, it might contain a .git file, rather than a directory. You might encounter this in the future if you work with Git's submodule system. For now, you probably won't run into this case.


What's in a repository?

There is no harm in looking at the contents of a .git directory. If you do, you will find these items (and a bunch more) in modern Git. None of these are promised to exist; they might change in the future. But I think it helps to know about them:

  • A file named config. This holds the local configuration for the repository. (It is a plain-text file, so you can view it however you like.)

  • A file named HEAD. This holds the name of your current branch. You generally should not count on this, because there are conditions in which HEAD contains a raw hash ID—this is what Git calls a detached HEAD—and there are some ways to use Git where this particular file isn't always relevant.

  • A directory named objects. This is where Git stores most of its main database.

  • A directory named refs. This is sometimes where Git stores most of its secondary database—but not always.

  • A file named index (not always present, but almost always). This holds Git's index or staging area.

From the above list, we can see that a Git repository mainly consists of two databases. One—the bigger one by far in most repositories—is a database of Git objects, which wind up somewhere deep under the objects directory. Every Git object has a unique hash ID, which is a big ugly string of letters and digits like 083378cc35c4dbcc607e4cdd24a5fca440163d17. You'll see these in git log output, for instance.

The most important kind of Git object is the commit. There are three other kinds of objects. You will hardly ever deal directly with any of them, but you should know that one of them is called a tree object, one is an annotated tag object, and the last one is a blob object. Tree objects store file names, while blob objects store file content. Annotated tag objects store the extra information that goes into an annotated tag.

The second database in a Git repository consists of names, which Git calls refs or references, that hold hash IDs. Each name holds exactly one hash ID. The names are branch names, tag names, and all the other kinds of names that you can see or that Git uses internally while working.

So, this gives us a proper description of a repository. It is:

  • a database of Git objects, primarily featuring commits, that Git looks up by their hash IDs; plus
  • a database of names, such as branch and tag names, that provide hash IDs for Git to look up.

There are a few other smaller items that get added to this, such as the special HEAD, the index / staging-area, and so on, but those two databases are the bulk of any Git repository. They are also deeply involved in how git fetch and git push, which transfer commits and other Git objects between two Gits, work.

The index and your work-tree

Your work-tree, if there is a work-tree at all—if this is a non-bare repository—is mainly just something that Git will fill in from commits, when necessary. It's where you will do your work. The reason you want or need a work-tree is simple: everything in a Git commit is frozen for all time. No part of any Git commit can ever be changed: not by you, and not by Git itself. This makes the commits great for archival, but completely useless for doing actual work.

The frozen (archived) files inside each commit2 are in a compressed, read-only, Git-only format. You presumably want them as files that aren't compressed and Git-only, so Git extracts them into ordinary, everyday computer files, in your work-tree.

When you go to make a new commit, though, Git doesn't use what's in your work-tree. Instead, Git uses copies of files that are in the index.3 The index, or staging area, effectively holds a copy of every file, ready to go into the next commit. An initial git checkout of some commit fills the index / staging-area from the commit, and also copies out (and un-freezes and uncompresses) the file into your work-tree, so that the current commit, proposed next commit, and work-tree all match.

This is why you must run git add so often. If you've updated the work-tree copy, you have to have Git re-compress the updated file, storing the frozen-format, ready-for-Git copy in the index (see footnote 3 again) so that the updated version is now proposed for the next commit. That's what git add does.

When you run git commit, Git packages up everything that is in the index. The proposed next commit becomes the actual commit. This commit stores the index's copies of the files as the commit's frozen snapshot. The commit also stores some metadata, such as your name and email address and the current date-and-time. Each commit also lists a set of previous commit hash IDs, usually just one, so this new commit that git commit makes lists the hash ID of the commit you were using, just a moment ago.

The new commit you just made then becomes the current commit. The act of writing out the new commit produces the unique hash ID for the new commit, so Git now stores that hash ID into the current branch name, as recorded in the special HEAD file. You have now updated both databases: the objects database has a new commit (and perhaps new tree and blob objects too), and the name-to-hash-ID database now records the new hash ID for the current branch.


2Technically, the commit simply lists a tree object, which gives the file's names. That tree object lists more tree objects recursively if/when appropriate, and also lists blob objects, which store the files in their frozen and compressed form. But it suffices to think of the files as being stored, in their frozen form, inside the commits.

3Technically, the index doesn't contain the actual files. It only contains their names plus the hash ID of a Git blob object, plus some caching information to make Git work fast. The cache information is the source of a third name for this one thing. So it's called the index when referring to the actual file .git/index, or the staging area when referring to how you use it, or sometimes (rarely these days) the cache when referring to the cached information. As before, though, it mostly suffices to think of the index as if it holds all the files.


git fetch and git push work with the databases

When you use git fetch or git push, you have your Git call up a second Git. Each Git has its own databases: its own collection of Git objects, and its own names. The two operations are slightly different.

When you use git fetch, your Git calls up their Git. Their Git then lists out their names and the hash ID that goes with each name. Your Git checks your own objects database for each of these hash IDs. You either have the object, by that hash ID, or you don't. If you don't have it, and want it—based on the name they told you—your Git asks their Git to give your Git that object. They will do that, but will also, for a commit, tell you what the hash ID is of its parent commit. You either have that commit already, or you don't. If you don't, your Git will ask their Git to give your Git that commit. That commit has some parents too, and this repeats until they either run out of commits—they're giving you everything they have—or they reach a commit that you already have.

So, when you use git fetch, you get from them all the (interesting) commits that they have, that you don't. The "interesting" part is based on the names they used for those commits: you can either bring over all their branch names, which is the default, or you can selectively only bring over commits identified by one particular branch name.

In the end, your Git has their commits plus your own commits, and also their name or names. Your Git then creates or updates your own remote-tracking names based on their branch names. For instance, if you ran git fetch originorigin being your name for their Git repository—and they said their master is 083378cc35c4dbcc607e4cdd24a5fca440163d17, you now have that commit, so your Git sets your origin/master to the hash ID 083378cc35c4dbcc607e4cdd24a5fca440163d17.

In other words, git fetch gets commits from them and updates your remote-tracking names.4 Your remote-tracking names, like origin/master and origin/develop, now hold the hash IDs that their branch names, master and develop in this case, hold.

The git push command is a bit different. Your Git still calls up their Git, but this time, your Git offers them your commits, by hash ID. Your Git will, for instance, say I have commit a123456..., do you have it? If they don't, your Git gives their Git this commit, and offers its parent commit(s). If they don't have those, your Git gives their Git those commits too. This is basically the same as fetch, except that you're giving them your commits (that they don't have) instead of them giving you their commits (that you don't have).

At the end of all of this, though, your Git sends their Git a polite request of the form: Now, if it's OK, please set your master to a123456.... That is, after you give them your commits that you have that they don't, you ask their Git to set their branch names. You don't ask them to set some other name. This is very different from git fetch. When you ran git fetch, you got their commits, but then your Git updated your remote-tracking names. This leaves your branch names alone! But you're asking them to set their branch names.

The fact that you ask them to set branch names makes push trickier than fetch. On the other hand, the fact that they send you commits, and you update your remote-tracking names, rather than your branch names, leaves you with a problem: after git fetch, you need a second Git command to actually use their commits.

Many people like to hide this second Git command by using git pull. The git pull command was originally a simple shell script that just ran git fetch and then ran git merge. Git's merge is a big, complicated command. You won't necessarily always want to use git merge after git fetch. You can tell git pull to use a different second command, but you have to decide in advance, at the time you type in git pull, which second command to use.

I don't like this. Some of this dislike is just because, in the bad old days of Git version 1.5, git pull would sometimes wreck everything, and I had that happen to me at least once. But some is because I like to use git fetch first, then look at what fetch brought in, and only then decide whether I'm going to merge, or rebase, or do something else entirely. I can't do that with git pull: I have to commit to merge-or-rebase before I see what came in.

In some usage patterns, this early commitment, way before I know what will actually come in, is OK. So I do sometimes (rarely) use git pull anyway. Mostly, though, I use, and advise others to use, a separate git fetch followed by whatever second command is appropriate.


4Git calls these remote-tracking branch names. I think the word branch in this phrase is a bad idea—these names are just different enough from real branch names that using that word is misleading—so I leave it out now.

torek
  • 448,244
  • 59
  • 642
  • 775
1

I like to give you one tip for you. when you learning the git, always use the git bash tool. It helps to understand inside the GitHub technology, after that you can use any software to interact with git

0

This question can be explained by using thousand of lines but simply It can be explained such as below. There are two kinds of repositories. One is local and the other one is the remote one. I think You are confused with the local repository and workspace. In order to something to be repository from your workspace, you need to add to index and add commit.otherwise it is not saved to your repository. It is in your workspace.we normally do is, we add to indexes and commit to the repository, then your code gonna save to your repository.