6

I'm trying to copy EXIF tags from one JPEG to another, which has no metadata. I tried to do what is described in this comment.

My idea is copy everything from the tags source file until the first ffdb excluded, then copy everything from the image source file (which has no tags) starting from the first ffdb included. The resulting file is corrupt (missing SOS marker).

A full reproducer, including the suggestion by Luatic, is available at https://go.dev/play/p/9BLjuZk5qlr. Just run it in a directory containing a test.jpg file with tags.

This is the draft Go code to do so.

func copyExif(from, to string) error {
    os.Rename(to, to+"~")
    //defer os.Remove(to + "~")

    tagsSrc, err := os.Open(from)
    if err != nil {
        return err
    }
    defer tagsSrc.Close()

    imageSrc, err := os.Open(to + "~")
    if err != nil {
        return err
    }
    defer imageSrc.Close()

    dest, err := os.Create(to)
    if err != nil {
        return err
    }
    defer dest.Close()

    // copy from tagsSrc until ffdb, excluded
    buf := make([]byte, 1000000)
    n, err := tagsSrc.Read(buf)
    if err != nil {
        return err
    }
    x := 0
    for i := 0; i < n-1; i++ {
        if buf[i] == 0xff && buf[i+1] == 0xdb {
            x = i
            break
        }
    }
    _, err = dest.Write(buf[:x])
    if err != nil {
        return err
    }

    // skip ffd8 from imageSrc, then copy the rest (there are no tags here)
    skip := []byte{0, 0}
    _, err = imageSrc.Read(skip)
    if err != nil {
        return err
    }
    _, err = io.Copy(dest, imageSrc)
    if err != nil {
        return err
    }

    return nil
}

Checking the result files it seems the code does what I described before.

On the top left, the source for tags. On the bottom left, the source for image. On the right, the result.

result

Does anybody know what I'm missing? Thank you.

neclepsio
  • 453
  • 3
  • 15
  • Could you provide the images you are testing with? – Luatic Jul 27 '23 at 09:36
  • Also, could you elaborate on what you mean by "copy metadata"? If both images have metadata, should the "old" metadata be fully thrown away and replaced by the new metadata? Do you care only about EXIF or also about copyright notices and comments? – Luatic Jul 27 '23 at 09:39
  • The second image has no metadata at all. You can use whatever jpeg with exif as source for exif, and whatever jpeg written by jpeg.Encode as source from image data. I will update the question with a full example. – neclepsio Jul 27 '23 at 09:55
  • I added a full reproducer, available at https://go.dev/play/p/9BLjuZk5qlr. Just run it in a folder containing a test.jpg with tags. – neclepsio Jul 27 '23 at 10:19

1 Answers1

1

This turns out to be more difficult than expected. I referred to this resource which explains the general structure of JPEG as a stream of segments, the only exception being the "Entropy-Coded Segment" (ECS) which holds the actual image data.

Problems with your approach

My idea is copy everything from the tags source file until the first ffdb excluded, then copy everything from the image source file (which has no tags) starting from the first ffdb included. The resulting file is corrupt (missing SOS marker).

This makes very strong assumptions about JPEG files which won't hold. First of all, ffdb can very well appear somewhere inside a segment. Ordering of segments is also very loose, so you have no guarantee what comes before or after ffdb (the segment which defines the quantization tables). Even if it did somehow happen to work most of the time, it would still be a very brittle, unreliable solution.

Proper approach

The proper approach is to iterate over all the segments, copying only metadata segments from the file providing the metadata and only non-metadata segments from the file providing the image data.

What complicates this is that for some reason, the ECS does not follow the segment conventions. Thus after reading SOS (Start of Scan), we need to skip to the end of ECS by finding the next segment tag: 0xFF followed by a byte that may neither be data (a zero) or a "restart marker" (0xD0 - 0xD7).

For testing, I used this image with EXIF metadata. My test command looked as follows:

cp exif.jpg exif_stripped.jpg && exiftool -All= exif_stripped.jpg && go run main.go exif.jpg exif_stripped.jpg

I used exiftool to strip the EXIF metadata, and then tested the Go program by readding it. Using exiftool exif_stripped.jpg (or an image viewer of your choice) I then viewed the metadata and compared against the output of exiftool exif.jpg (side note: you could probably obsolete this Go program entirely simply by using exiftool).

The program I wrote replaces EXIF metadata, comments, and copyright notices. I added a simple command-line interface for testing. If you want to keep only EXIF metadata, simply change the isMetaTagType function to

func isMetaTagType(tagType byte) bool { return tagType == exif }

Full program

package main

import (
    "os"
    "io"
    "bufio"
    "errors"
)

const (
    soi = 0xD8
    eoi = 0xD9
    sos = 0xDA
    exif = 0xE1
    copyright = 0xEE
    comment = 0xFE
)

func isMetaTagType(tagType byte) bool {
    // Adapt as needed
    return tagType == exif || tagType == copyright || tagType == comment
}

func copySegments(dst *bufio.Writer, src *bufio.Reader, filterSegment func(tagType byte) bool) error {
    var buf [2]byte
    _, err := io.ReadFull(src, buf[:])
    if err != nil { return err }
    if buf != [2]byte{0xFF, soi} {
        return errors.New("expected SOI")
    }
    for {
        _, err := io.ReadFull(src, buf[:])
        if err != nil { return err }
        if buf[0] != 0xFF {
            return errors.New("invalid tag type")
        }
        if buf[1] == eoi {
            // Hacky way to check for EOF
            n, err := src.Read(buf[:1])
            if err != nil && err != io.EOF { return err }
            if n > 0 {
                return errors.New("EOF expected after EOI")
            }
            return nil
        }
        sos := buf[1] == 0xDA
        filter := filterSegment(buf[1])
        if filter {
            _, err = dst.Write(buf[:])
            if err != nil { return err }
        }

        _, err = io.ReadFull(src, buf[:])
        if err != nil { return err }
        if filter {
            _, err = dst.Write(buf[:])
            if err != nil { return err }
        }

        // Note: Includes the length, but not the tag, so subtract 2
        tagLength := ((uint16(buf[0]) << 8) | uint16(buf[1])) - 2
        if filter {
            _, err = io.CopyN(dst, src, int64(tagLength))
        } else {
            _, err = src.Discard(int(tagLength))
        }
        if err != nil { return err }
        if sos {
            // Find next tag `FF xx` in the stream where `xx != 0` to skip ECS
            // See https://stackoverflow.com/questions/2467137/parsing-jpeg-file-format-format-of-entropy-coded-segments-ecs
            for {
                bytes, err := src.Peek(2)
                if err != nil { return err }
                if bytes[0] == 0xFF {
                    data, rstMrk := bytes[1] == 0, bytes[1] >= 0xD0 && bytes[1] <= 0xD7
                    if !data && !rstMrk {
                        break
                    }
                }
                if filter {
                    err = dst.WriteByte(bytes[0])
                    if err != nil { return err }
                }
                _, err = src.Discard(1)
                if err != nil { return err }
            }
        }
    }
}

func copyMetadata(outImagePath, imagePath, metadataImagePath string) error {
    outFile, err := os.Create(outImagePath)
    if err != nil { return err }
    defer outFile.Close()
    writer := bufio.NewWriter(outFile)

    imageFile, err := os.Open(imagePath)
    if err != nil { return err }
    defer imageFile.Close()
    imageReader := bufio.NewReader(imageFile)

    metaFile, err := os.Open(metadataImagePath)
    if err != nil { return err }
    defer metaFile.Close()
    metaReader := bufio.NewReader(metaFile)

    _, err = writer.Write([]byte{0xFF, soi})
    if err != nil { return err }
    {
        // Copy metadata segments
        // It seems that they need to come first!
        err = copySegments(writer, metaReader, isMetaTagType)
        if err != nil { return err }
        // Copy all non-metadata segments
        err = copySegments(writer, imageReader, func(tagType byte) bool {
            return !isMetaTagType(tagType)
        })
        if err != nil { return err }
    }
    _, err = writer.Write([]byte{0xFF, eoi})
    if err != nil { return err }

    // Flush the writer, otherwise the last couple buffered writes (including the EOI) won't get written!
    return writer.Flush()
}

func replaceMetadata(toPath, fromPath string) error {
    copyPath := toPath + "~"
    err := os.Rename(toPath, copyPath)
    if err != nil { return err }
    defer os.Remove(copyPath)
    return copyMetadata(toPath, copyPath, fromPath)
}

func main() {
    if len(os.Args) < 3 {
        println("args: FROM TO")
        return
    }
    err := replaceMetadata(os.Args[2], os.Args[1])
    if err != nil {
        println("replacing metadata failed: " + err.Error())
    }
}
Luatic
  • 8,513
  • 2
  • 13
  • 34
  • Thank you for your great effort! Something's still off. My approach for stripping metadata is simply decoding end re-encoding with image package, in this case when reading again using your test image I get: "invalid JPEG format: short Huffman data". Using my image I get: "EOF expected after EOI". I will debug and report back. Thank you again! – neclepsio Jul 27 '23 at 13:06
  • You're welcome! Now that's odd, my image viewer and `exiftool` don't complain. I've tested with Python's Pillow and Java's ImageIO and both seem to accept `exif_stripped.jpg`, but I can confirm that Go's `jpeg.Decode` throws this error. I'll take a look - either many JPEG decoders are being too lenient here, or Go's is being too strict. – Luatic Jul 27 '23 at 14:21
  • @neclepsio found it, I forgot to flush the writer . Changing `return nil` to `return writer.Flush()` in `copyMetadata` does the trick. I was effectively missing a buffer at the end of the file, so EOI etc. were missing. I am negatively surprised that Pillow, ImageIO, image viewers etc. won't even output a warning, and I'm positively surprised by Go! Fun fact: Trying to strip the metadata from `exif_stripped.jpg` after processing did yield an error, but just reading didn't. Crazily lenient software. – Luatic Jul 27 '23 at 16:02
  • Thank you! The Pixel 4a camera app adds data after EOI, so I also removed the check for EOF after EOI. – neclepsio Jul 28 '23 at 07:47
  • @neclepsio sure, the question just is: What kind of data, and what do you want to do with it? Do you want to preserve or strip it? But I assume you already figured this out :) – Luatic Jul 28 '23 at 09:48
  • Yes, I want to strip it; I also discovered a thumbnail is present in 0xFFE1, which I'm going to strip also. I'm post ingthe complete code when I finish. – neclepsio Jul 28 '23 at 10:51
  • @neclepsio after some more review, you might want to update `isMetaTagType` to consider the entire `APP1` - `APP14` range as metadata: `return tagType >= app1 || tagType <= app14 || tagType == comment` where `app1 = 0xE1` and `app14 = 0xEE`. – Luatic Jul 29 '23 at 21:42