What is the general format of Ruby "diff-lcs" diff output?

Question

The Ruby diff-lcs library does a great job of generating the changeset you need to get from one sequence to another but the format of the output is somewhat confusing to me. I would expect a list of changes but instead the output is always a list containing one or two lists of changes. What is the meaning/intent of having multiple lists of changes?

Consider the following simple example:

> Diff::LCS.diff('abc', 'a-c')
# => [[#<Diff::LCS::Change:0x01 @action="-", @position=1, @element="b">,
#      #<Diff::LCS::Change:0x02 @action="+", @position=1, @element="-">],
#     [#<Diff::LCS::Change:0x03 @action="-", @position=3, @element="">]]

Ignoring the fact that the last change is blank, why are there two lists of changes instead of just one?

score 3 · Accepted Answer · answered Aug 28 '12 at 23:07

You might have better luck with a better example. If you do this:

Diff::LCS.diff('ab cd', 'a- c_')

Then the output looks like this (with the noise removed):

[
  [
    <@action="-", @position=1, @element="b">,
    <@action="+", @position=1, @element="-">
  ], [
    <@action="-", @position=4, @element="d">,
    <@action="+", @position=4, @element="_">
  ]
]

If we look at Diff::LCS.diff('ab cd ef', 'a- c_ e+'), then we'd get three inner arrays instead of two.

What possible reason could there be for this? There are three operations in a diff:

Add a string.
Remove string.
Change a string.

A change is really just a combination of removes and adds so we're left with just remove and add as the fundamental operations; these line up with the @action values quite nicely. However, when humans look at diffs, we want to see a change as a distinct operation, we want to see that b has become -, the "remove b, add -" version is an implementation detail.

If all we had was this:

[
  <@action="-", @position=1, @element="b">,
  <@action="+", @position=1, @element="-">,
  <@action="-", @position=4, @element="d">,
  <@action="+", @position=4, @element="_">
]

then you'd have to figure out which +/- pairs were really changes and which were separate additions and removals.

So the inner arrays map the two fundamental operations (add, remove) to the three operations (add, remove, change) that humans want to see.

You might want to examine the structure of the outputs from these as well:

Diff::LCS.diff('ab cd', 'a- x c_')
Diff::LCS.diff('ab', 'abx')
Diff::LCS.diff('ab', 'xbx')

I think an explicit change @action for Diff::LCS::Change would be better but at least the inner arrays let you group the individual additions and removals into higher level edits.

Ah yes, that looks right - each inner array represents a "change" (a "delete/add" pair at the same position). So `Diff::LCS.diff('a1 b2 c3 d4', 'aw bx cy dz').size # => 4` since there are four changes. Thanks! — maerics, Aug 29 '12 at 13:23
@maerics: Yeah, if there was just one flat list of edits you'd have to manually match up the `@position` values to extract the C-level edits from the ASM-level list of `Diff::LCS::Change`s. The [documentation for the Perl version](http://search.cpan.org/dist/Algorithm-Diff/lib/Algorithm/Diff.pm#diff) is worth a look: "The description is a list of *hunks*; each hunk represents a contiguous section of items which should be added, deleted, or replaced. The return value of `diff` is a list of hunks..." — mu is too short, Aug 29 '12 at 17:24
I understand the structure of the array of arrays, but I'm still left wondering how I would get a human-friendly difference. Such as `old content` wrapped in a `del` tag and `new sentence` wrapped in an `add` tag. — Archonic, Mar 26 '13 at 17:26
@Archonic: Maybe a new question is in order, that would give everyone more room to work with than comments. Basically, you backtrack from the `@action`s inside the hunks to reconstruct the before and after versions. — mu is too short, Mar 26 '13 at 17:38
Figured it out here: http://stackoverflow.com/questions/15648875/make-output-of-diff-lcs-human-readable/15648876#15648876 — Archonic, Mar 26 '13 at 22:53

What is the general format of Ruby "diff-lcs" diff output?

1 Answers1