gitk
is looking at the cache information in the index to determine whether your working directory is dirty or not. The index stores information about the state of the current working directory, so that it does not need to analyze the files.
When you run git status
, it will compare the contents of HEAD
to the contents of the index, to show staged changes. This is simple and quick; if the file's ID is different, then its contents must be different. However, there is a more costly computation to determine if a file has unstaged changes. The file must have its SHA1 computed, and then compared to the value in the index.
To avoid this costly computation, git caches the struct stat
information about the working directory contents in the index:
README.md
ctime: 1516120578:638662531
mtime: 1516120578:638662531
dev: 16777220 ino: 1752439
uid: 501 gid: 20
size: 13224 flags: 0
Now, when you run git status
, it can just stat
the contents of the working directory. If any file has the same size, inode, ctime, mtime, etc, then git assumes that the file has not changed. This allows git status
to stay performant when you have unchanged files. But if any file has a different value, then it will hash the file. If the file has the same hash (ie, you've simply run touch
on the file without changing the contents) then the index will be updated with the new cache information. If you've actually changed the file, then git status
will report the unstaged change.
gitk
however does not bother to hash the file to determine whether it has truly changed. You can see this yourself with a trivial example. Here I have a repository with one file, foo
, with no changes.

If, on the command-line, I touch the file, updating its timestamp:
% touch foo
Now, gitk reports my repository as having uncommitted changes:

However, if I run git status
again on the command line, it will update the cache information in the index, and now gitk
will understand that there really aren't any unstaged changes:

When you untarred your repository - with the working directory - you are putting on-disk a working directory that doesn't matched the cache information in the index. git
would actually rehash the contents to determine that your working directory is not, in fact, dirty, but gitk
does not.
It is generally not a good idea to copy a git repository and working directory; generally speaking, you should check out a new working directory instead.