1

I'm using text files as a kind of very simple database, that just contain NUL-separated strings. I want to manage these files by git, as text.

Here is a sample file (^@ means NUL):

Tlrl-ng tl tlu
^@aget does not wait for all forked processed. Probably unsolable unless we invoke zsh -c
^@webhook siri
^@login as peter

If this is not possible, then what character should I use instead of NUL? The records can be multiline.

Please note that I need git-merge to consider these files as normal text. I use the gitattribute * merge=union diff text.

HappyFace
  • 3,439
  • 2
  • 24
  • 43
  • 1
    Add `text` to the gitattributes entry for the file, and Git will feed the file to `diff` and attempt to display it. Note that the NULs may not display well and may confuse other text-oriented programs. If I needed multiline records that could include all bytes, I might just encode the entire file in some fashion, rather than putting raw bytes into the file... – torek Aug 10 '19 at 17:43
  • @torek Like ‘* text’? – HappyFace Aug 10 '19 at 17:45
  • 1
    You can do that. But it's more efficient to list all the attributes on one line: `* text merge=union`, for instance. Worth considering: union merge is a pretty dumb merge algorithm, so you should use it with care. – torek Aug 10 '19 at 17:47
  • @torek It didn't work. `warning: Cannot merge binary files: attic/.attic_todo ((null) vs. (null)) Auto-merging attic/.attic_todo`. – HappyFace Aug 10 '19 at 18:49
  • Interesting: it's a bit surprising that union merge didn't allow the NUL bytes. Not, perhaps, completely surprising, especially if it's written in C and uses NUL-terminated strings. Encoding the file will fix everything, though. – torek Aug 10 '19 at 22:05

1 Answers1

1

If this is not possible, then what character should I use instead of NUL? The records can be multiline.

Best way to do it might be to COBS-encode your data as a single packet, you could add 11 to all lengths (and increase the byte-stuffing overhead by a tiny fraction of a percent) to avoid punning a length with a newline.

jthill
  • 55,082
  • 5
  • 77
  • 137
  • Given that he wants to union-merge the files, plus the observation that union merge doesn't like NUL bytes, I'd recommend some other encoding. It probably should re-encode newlines. – torek Aug 10 '19 at 22:08
  • @torek COBS eliminates nulls in the output encoding; "as a single packet" means there'd be no nulls in the file at all, the trailing 0's presence is implicit, like the leading 1 in floating point values. How you avoid the newline punning in length bytes is arbitrary, add 11, skip 10, whatever. – jthill Aug 10 '19 at 22:15
  • Oh, right, for some reason I was thinking that the NULs were in his data too. – torek Aug 10 '19 at 22:27