Background: I have images I need to compare for differences. The images are large (on the order of 1400x9000 px), machine-generated and highly constrained (screenshots of a particular piece of linear UI), and are expected to be nearly identical, with differences being one of the following three possibilities:
- Image 1 has a section image 2 is missing
- Image 1 is missing a section image 2 has
- Both images have the given section, but its contents differ
I'm trying to build a tool that highlights the differences for a human reviewer, essentially an image version of line-oriented diff. To that end, I'm trying to scan the images line by line and compare them to decide if the lines are identical. My ultimate goal is an actual diff-like output, where it can detect that sections are missing/added/different, and sync the images up as soon as possible for the remaining parts of identical content, but for the first cut, I'm going with a simpler approach where the two images are overlaid (alpha blended), and the lines which were different highlighted with a particular colour (ie. alpha-blended with a third line of solid colour). At first I tried using Python Imaging Library, but that was far several orders of magnitude too slow, so I decided to try using vips
, which should be way faster. However, I have absolutely no idea how to express what I'm after using vips
operations. The pseudocode for the simpler version would be essentially:
out = []
# image1 and image2 are expected, but not guaranteed to have the same height
# they are likely to have different heights if different
# most lines are entirely white pixels
for line1, line2 in zip(image1, image2):
if line1 == line2:
out.append(line1)
else:
# ALL_RED is a line composed of solid red pixels
out.append(line1.blend(line2, 0.5).blend(ALL_RED, 0.5))
I'm using pyvips
in my project, but I'm also interested in code using plain vips
or any other bindings, since the operations are shared and easily translated across dialects.
Edit: adding sample images as requested
Edit 2: full size images with missing/added/changed sections: