0

If we are going to implement tests to compare some inline outputs from the program and the expected outputs, we want to enforce different levels of "strictness":

  • exact match
  • exact match after trimming
  • include occurrences (i.e. all lines in the expected output present in the actual output, in the same order)

All these levels of strictness can be easily implemented if a unified diff is not desired.

However, is it possible to use an existing unified diff generator library (Python difflib), combined with certain preprocessing and post-interpretation techniques to tell if the outputs satisfy the third criteria given above?

For example, if the expected output and the actual output are as follows:

Expected:

123
asd
fgh

Actual:

123
asd
test
fgh

If we only expect occurences, this is evaluated to be a match.

The diff will be something like this:

asd
+ test
fgh

One way that I have thought of is to check if we can only see additions in the diff. This is valid in the example provided, as the line "test" only presents in the actual output. But I can't tell if this applies to other cases.

As a followup, if this doesn't work, what will be a way to generate unified diff while only checking for occurrence with Python?

PIG208
  • 2,060
  • 2
  • 10
  • 25
  • In general, the big problem with building unified diffs for use cases that don't really need them is that they're expensive to calculate -- much more expense than is called for when you don't specifically _need_ to find the shortest-path edit. Also, they track order, so they can say lines were added when those lines aren't really new, but were just reordered. – Charles Duffy Dec 19 '21 at 19:04
  • ...if all you want is set arithmetic, it's cheaper to just use set arithmetic instead. In shell scripts, this means reaching for `comm` instead of `diff`; in Python... well, you have native sets out-of-the-box. – Charles Duffy Dec 19 '21 at 19:05
  • But beyond that, I don't know you've provided enough details in the question to really permit a good answer. I don't see anything about ordering sensitivity, f/e, nor the scope/size of your output (important to determine how relevant performance concerns are in practice). Insofar as this is at its core a tool-selection question, a lot of it comes down to how one prioritizes competing concerns, and that tends to be opinion-centric. – Charles Duffy Dec 19 '21 at 19:07

0 Answers0