1

TL;DR I configured a difftool and git-diff gives "intelligent" diffs but git-add creates "stupid" hunks. Why?

I configured the difftool to use nbdime with nbdime config-git --enable --global which I think essentially just adds these lines to my .gitconfig:

[diff "jupyternotebook"]
    command = git-nbdiffdriver diff
[merge "jupyternotebook"]
    driver = git-nbmergedriver merge %O %A %B %L %P
    name = jupyter notebook merge driver
[difftool "nbdime"]
    cmd = git-nbdifftool diff \"$LOCAL\" \"$REMOTE\" \"$BASE\"
[difftool]
    prompt = false
[mergetool "nbdime"]
    cmd = git-nbmergetool merge \"$BASE\" \"$LOCAL\" \"$REMOTE\" \"$MERGED\"
[mergetool]
    prompt = false

Now git diff gives the good output I expect:

nbdiff /var/folders/6b/03yw1pts2nx_q8vftrh6fv140000gp/T//FILE.ipynb FOLDER/FILE.ipynb
--- /var/folders/6b/03yw1pts2nx_q8vftrh6fv140000gp/T//FILE.ipynb  2022-05-17 14:29:39.937318
+++ FOLDER/FILE.ipynb  2022-05-17 14:09:45.222229
## inserted before /cells/0:
+  code cell:
+    source:
+      ...
+  markdown cell:
+    source:
+      ...

## deleted /cells/0:
-  markdown cell:
-    source:
-      ...

## inserted before /cells/2:
+  code cell:
+    source:
+      ...

But if I do git add -e FOLDER/FILE.ipynb, it gives me a "really bad" diff:

diff --git a/FOLDER/FILE.ipynb b/FOLDER/FILE.ipynb
index 3a1540c..17363f8 100644
--- a/FOLDER/FILE.ipynb
+++ b/FOLDER/FILE.ipynb
@@ -1,621 +1,716 @@
 {
- "cells": [
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    ...
-   ]
-  },
-  ... almost every line in the file is removed
+  "cells": [
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "j1qKT6qtAYEj"
+      },
+      "outputs": [],
+      "source": [
+        ...
+      ]
+    },
+    ... almost every line in the file is added back

I may have a fundamental misunderstanding of what git-add does, but why isn't git add using the nbdime diff tool? And is there a way I can add just the changes that I see in git-diff ?

gchen
  • 126
  • 6

1 Answers1

2

Both git add -e and git add -p need to be able to understand an edited diff. They have a limited amount of comprehension of diffs in general, and require the "dumb" format from plain git diff. The nbdime tools take the original files apart, re-shuffle them into usable text, and diff that usable text,1 but that's not what's actually in the files, and git add -e needs to work on what's in the files, not some cleaned-up presentation thereof.


1What's in the files is machine-readable JSON. The result of the nbdime tools appears to be yaml. If Git had a native JSON diff engine, git add -p and company would be able to deal with the result, but Git doesn't, so it isn't. If Jupyter-notebooks used yaml, Git's line-oriented tools would be able to deal with them, but Jupyter-notebooks doesn't, so it isn't.

torek
  • 448,244
  • 59
  • 642
  • 775
  • `git add -e` opens `$(git var GIT_EDITOR)`, whereas `git add -p` shows output in the terminal. So for the first one you would need a notebook-specific editor, but the second one could, in theory, use something notebook-specific via [`interactive.diffFilter`](https://git-scm.com/docs/git-config#Documentation/git-config.txt-interactivediffFilter), as long as it has a 1-to-1 correspondance in terms of number of lines... – philb May 17 '22 at 20:57
  • So hypothetically if nbdime had an output that outputs in the "dumb" plain format in the same format as `git diff`, then would `git add` be able to use those outputs? And if so, would it automatically know to use that diff tool (that outputs in "dumb format" or is there another configuration that specifies which diff to use for git add? – gchen May 17 '22 at 21:18
  • @philb: well, but with `git add -p`, the perl code splits out each hunk and then gives you the opportunity to do something with it (including edit it). The editing step is like the `git add -e` here. (The result is run through `git apply` with the `--recount` option.) – torek May 17 '22 at 23:31
  • 1
    @gchen: Not really, no: the problem is that the output from `nbdime` is a set of changes to *different files*. If nbdime itself had a way to take edited patches and turn them back into the *original files*, that would work. But now instead of `git add -e` you'd run some nbdime too, edit, and run another nbdime tool: Git itself would never even be invoked here! (Except maybe behind the scenes *by* the nbdime tool) – torek May 17 '22 at 23:32
  • 1
    By the way, the fact that there is a `git-nbmergedriver` in this package means that it (the package) contains everything needed to *build* a proper partial-patch-applier. It just needs some work to give you a front end: something that, like the Git `add -e`/`add -p` code, would let you edit the patch and use that edited patch to come up with the internal changes to apply, which you'd then feed into that same merge driver code. So it's maybe 80 to 90% of the way done. You would have to write some code to finish it. – torek May 18 '22 at 00:18