1

I have files that I want to compare, and I would need to have a way to force lines to match for the algorithm to pick the block to compare correctly.

For example: FILE1

test1
    subline1
    subline2
    subline3
test2
    subline1
    subline2
    subline3
    subline4
test3
    subline1
    subline2
    subline3
test4
    subline1
test5
    subline2
    subline3
    subline4

FILE2

test1
    subline1
    subline2
    subline3
test3
    subline1
    subline2
    subline3
    subline4
test4
    subline1
    subline2
    subline3
    subline4

Any Tools I use out there, I cannot force a perfect match on the line with "test", and since the content of the blocks are similar, it's always matching incorrectly.

See images below : Notepad++ Compare Winmerge

Meld and diff didn't work either.

Thanks

xlash
  • 121
  • 5
  • 3
    Recent versions of `meld` can do it if you place synchronization points. – kasperd Feb 05 '16 at 19:01
  • 3.12.3, synchronization points were manual, and I cannot get it to work both ways. tried 3.14.2 without success. When I add a sync point, it doesn't line up anything. – xlash Feb 09 '16 at 17:37
  • Synchronization points are placed manually because they are used when you are not satisfied with how the automated algorithms line up the differences. They work for me in 1.8.4. – kasperd Feb 09 '16 at 17:42
  • 1
    Beyond compare can do that, but it's not free – phuclv May 17 '17 at 07:15

2 Answers2

1

diff is a line-based algorithm, but it seems what you want to match are not lines, but blocks of lines.

One possibility is to use an intermediate step to put each block on one line by joining the blocks in the line together, then then you could use diff on the result.

git diff which supports 4 different diff'ing algorithms, and you can diff two files even if they aren't in a git repo:

--diff-algorithm={patience|minimal|histogram|myers}
       Choose a diff algorithm. The variants are as follows:

       default, myers
           The basic greedy diff algorithm. Currently, this is the default.

       minimal
           Spend extra time to make sure the smallest possible diff is produced.

       patience
           Use "patience diff" algorithm when generating patches.

       histogram
           This algorithm extends the patience algorithm to "support low-occurrence common elements".

       For instance, if you configured diff.algorithm variable to a non-default value and want to use the
       default one, then you have to use --diff-algorithm=default option.

However, in testing your files, all the algorithms produced the same result as diff would.

There are other tools for diffing structured formats, like XML or JSON, the block-wise diffs you'd like are neither line-based or another formal structure.

Ultimately, I think for a diff'ing algorithm to work for you, your data needs to be line-based or another formal format.

phuclv
  • 169
  • 1
  • 16
Mark Stosberg
  • 3,901
  • 24
  • 28
  • That is really tedious. I would prefer a way to force certain lines to match. – xlash Feb 05 '16 at 17:48
  • Same results as you for the difftools in git. I can workout the structure of that "must-match" line, to represent a JSON object, with content in it. – xlash Feb 06 '16 at 16:07
  • JSON diff did the trick. Please repost that answer, and I'll accept it. http://goo.gl/p9AWOC – xlash Feb 06 '16 at 16:32
  • I'm not sure what you mean by repost. I suggested converting it to JSON already in my answer. The suggestion of `git diff` may be useful to someone else searching with a similar question. – Mark Stosberg Feb 07 '16 at 02:29
-1

TotalCommander will help you (compare by content)