-1

Sometimes git clone results a dirty worktree. I get it, you commit text files with crlf, you configure .gitattributes to force lf, and set autocrlf to true - you are asking for trouble. But why under the same conditions you would sometimes get a dirty worktree, and sometimes - clean?

Working with Git for Windows 2.40.1 (confirmed not to be platform or version specific in general, but the repro seems to be Windows specific). The only setting I have changed is set core.autocrlf=true. Here is the script that repeatedly clones a repo and checks if it's dirty. Well, technically, it just refreshes the index, but you can reproduce the same with a clean clone, just takes longer.

#!/usr/bin/env bash
set -e

# Continuously reset a repo and check its status

REPO_URL="https://github.com/balakine/crlf.git"
REPO_DIR="crlf"
COUNT_TOTAL=0

# Clone the repository
rm -rf "$REPO_DIR"
git clone --branch master --single-branch --depth=1 "$REPO_URL" "$REPO_DIR"
cd "$REPO_DIR"

until git diff --quiet; do
    ((++COUNT_TOTAL))

    # Reset the repository
    git rm --quiet --cached -r .
    git reset --quiet --hard

    printf "\033[0;32mTotal: %s\033[0m\n" "$COUNT_TOTAL"
done

git status

It will take a while, about 5,000 tries on my machine, but you will get a clean repo at some point. Why? I have an example of a repo that clones clean 8/100, but it's proprietary.

Iterokun
  • 184
  • 6
  • You've not mentioned it explicitly, but I assume that the dirty workspace is "just" CRLF/LF differences, right? – Joachim Sauer Jul 12 '23 at 11:31
  • There are settings related to EOL-format that affect how files are _checked out_ so since the moment you clone, you will get a dirty working tree if settings are placed under the _right_ (wrong?) conditions. – eftshift0 Jul 12 '23 at 11:32
  • Right, @JoachimSauer, the dirty workspace has only `crlf` diff – Iterokun Jul 12 '23 at 11:53
  • @eftshift0 the question is why is it clean _sometimes_? – Iterokun Jul 12 '23 at 11:55
  • Are global settings **the same** when you clone? – eftshift0 Jul 12 '23 at 12:14
  • Yes, @eftshift0, if you check the script it doesn't change global (or any other) settings. – Iterokun Jul 12 '23 at 12:18
  • You don't do any error handling in your script. Maybe your `git clone` / `git reset` commands are crashing most of the time? Or is their exit-code always `0`? Also: What file system is that on? Just a local NTFS drive? – Jay Jul 12 '23 at 13:33
  • Fair point, @Jay, I added error handling. Yes the file system is a local NTFS drive. – Iterokun Jul 13 '23 at 06:08

1 Answers1

4

This can actually occur any time you have Git perform any conversion, including a line-ending conversion or a smudge/clean filter, such as with Git LFS. The reason is that Git stores certain information in the index, including a timestamp, file size, and on Unix, additional information about the device and inode.

When Git notices a file that has the same timestamp as the version in the index, then it doesn't know if the file has actually been silently updated in the same second. This is called the "racy Git" problem. As a result, Git will need to re-read the file, which will re-perform any conversions.

Normally, when you perform a checkout, Git writes the index immediately thereafter, so all the files are correctly written into the index. However, if you encounter the racy Git problem for one or more files, and those files are such that the conversion Git provides would cause them to be modified (because their line endings are incorrect or they're not correctly stored as Git LFS files), then they show up as modified.

For reasons unknown to me, this tends to occur more frequently on Windows than on other OSes, but if you touch every file in the working tree (e.g., with git ls-files -z | xargs -0 touch), then you'll see all the affected files, since a git status will have to re-read and re-process every file in the working tree.

The way to handle this is to store your files as they're supposed to be stored. For text files, that means you need to mark them as text in .gitattributes (* text=auto works for many cases) and then Git will always store them as LF in the repository and use your preferred line endings when checking out (unless you specify eol=lf or eol=crlf).

bk2204
  • 64,793
  • 6
  • 84
  • 100
  • Interesting. Pure shot in the dark speculations on what might cause this on Windows would be: different mtime resolution on NTFS compared to common Linux OS, or alternatively git exec's some command for some of the steps and that's slower on Windows than on Linux (but I don't think it does, by default, unless some external tools are configured). – Joachim Sauer Jul 13 '23 at 08:54
  • 1
    @JoachimSauer, the issue is not purely Windows. I've just reproduced it on a Mac (Intel, Ventura 13.4.1, command line tools for Xcode 14.3) with all default settings (no `core.autocrlf`, no `core.eol`). Just run the script and wait for it to finish once it gets a "racily clean" worktree. – Iterokun Jul 14 '23 at 08:58
  • Yeah, I understand that it's not exclusive to Window, but the answer mentions that it's more frequent on Windows and I was just speculating what property of that OS could make this true (if it's true at all). – Joachim Sauer Jul 14 '23 at 09:54