Merging video with original orientation

Question

I have a link I can DM for a minimum working example!

Recording Videos

For recording, the AVCaptureConnection for an AVCaptureSession, I set isVideoMirrored to true when using the front camera and false when using the back camera. All in portrait orientation.

Saving Videos

When I save videos, I perform an AVAssetExportSession. If I used the front camera, I want to maintain the isVideoMirrored = true, so I create an AVMutableComposition to set the AVAsset video track's preferredTransform to CGAffineTransform(scaleX: -1.0, y: 1.0).rotated(by: CGFloat(Double.pi/2)). For the back camera, I export the AVAsset as outputted.

Part of my saving code:

        if didCaptureWithFrontCamera {
        
        let composition = AVMutableComposition()
        
        let assetVideoTrack = asset.tracks(withMediaType: .video).last!
        let assetAudioTrack = asset.tracks(withMediaType: .audio).last!
        
        let compositionVideoTrack = composition.addMutableTrack(withMediaType: AVMediaType.video, preferredTrackID: CMPersistentTrackID(kCMPersistentTrackID_Invalid))
        let compositionAudioTrack = composition.addMutableTrack(withMediaType: AVMediaType.audio, preferredTrackID: CMPersistentTrackID(kCMPersistentTrackID_Invalid))
        
        try? compositionVideoTrack?.insertTimeRange(CMTimeRangeMake(start: CMTime.zero, duration: asset.duration), of: assetVideoTrack, at: CMTime.zero)
        try? compositionAudioTrack?.insertTimeRange(CMTimeRangeMake(start: CMTime.zero, duration: asset.duration), of: assetAudioTrack, at: CMTime.zero)
        
        compositionVideoTrack?.preferredTransform = CGAffineTransform(scaleX: -1.0, y: 1.0).rotated(by: CGFloat(Double.pi/2))
        
        guard let exportSession = AVAssetExportSession(asset: composition, presetName: AVAssetExportPreset1280x720) else {
            handler(nil)
            return
        }
        
        exportSession.outputURL = outputURL
        exportSession.outputFileType = .mp4
        exportSession.shouldOptimizeForNetworkUse = true
        exportSession.exportAsynchronously { handler(exportSession) }
        
    } else {
        
        guard let exportSession = AVAssetExportSession(asset: asset, presetName: AVAssetExportPreset1280x720) else {
            handler(nil)
            return
        }
        
        exportSession.outputURL = outputURL
        exportSession.outputFileType = .mp4
        exportSession.shouldOptimizeForNetworkUse = true
        exportSession.exportAsynchronously { handler(exportSession)  }
    }

Merging Videos

Later, to view the saved videos, I want to merge them together as a single video and maintain each by their original orientation via AVMutableComposition.

What partially has worked is setting the video track of AVMutableComposition to the preferredTransform property of the video track of an individual AVAsset video. The only problem is that a single orientation is applied to all the videos (i.e. mirroring isn't applied in a back camera recorded video and the same is applied to the front camera video too).

From solutions I've come across it appears I need to apply AVMutableVideoCompositionInstruction, but in trying to do so, the AVAssetExportSession doesn't seem to factor in the videoComposition instructions at all.

Any guidance would be extremely appreciated as I haven't been able to solve it for the life of me...

My attempted merge code:

func merge(videos: [AVURLAsset], for date: Date, completion: @escaping (_ url: URL, _ asset: AVAssetExportSession)->()) {
let videoComposition = AVMutableComposition()
var lastTime: CMTime = .zero

var count = 0
var instructions = [AVMutableVideoCompositionInstruction]()
let renderSize = CGSize(width: 720, height: 1280)

guard let videoCompositionTrack = videoComposition.addMutableTrack(withMediaType: .video, preferredTrackID: Int32(kCMPersistentTrackID_Invalid)) else { return }
guard let audioCompositionTrack = videoComposition.addMutableTrack(withMediaType: .audio, preferredTrackID: Int32(kCMPersistentTrackID_Invalid)) else { return }

for video in videos {
    
    if let videoTrack = video.tracks(withMediaType: .video)[safe: 0] {
            
        //this is the only thing that seems to work, but work not in the way i'd hope where each video keeps its original orientation
       //videoCompositionTrack.preferredTransform = videoTrack.preferredTransform
        
        if let audioTrack = video.tracks(withMediaType: .audio)[safe: 0] {
            
            do {
                
                try videoCompositionTrack.insertTimeRange(CMTimeRangeMake(start: .zero, duration: video.duration), of: videoTrack, at: lastTime)
                try audioCompositionTrack.insertTimeRange(CMTimeRangeMake(start: .zero, duration: video.duration), of: audioTrack, at: lastTime)
                
                let layerInstruction = videoCompositionInstruction(videoTrack, asset: video, count: count)

                let videoCompositionInstruction = AVMutableVideoCompositionInstruction()
                videoCompositionInstruction.timeRange = CMTimeRangeMake(start: lastTime, duration: video.duration)
                videoCompositionInstruction.layerInstructions = [layerInstruction]

                instructions.append(videoCompositionInstruction)

            } catch {
                return
            }
            lastTime = CMTimeAdd(lastTime, video.duration)
            count += 1
            
        } else {
            
            do {
                
                try videoCompositionTrack.insertTimeRange(CMTimeRangeMake(start: .zero, duration: video.duration), of: videoTrack, at: lastTime)
                
                let layerInstruction = videoCompositionInstruction(videoTrack, asset: video, count: count)

                let videoCompositionInstruction = AVMutableVideoCompositionInstruction()
                videoCompositionInstruction.timeRange = CMTimeRangeMake(start: lastTime, duration: video.duration)
                videoCompositionInstruction.layerInstructions = [layerInstruction]

                instructions.append(videoCompositionInstruction)

            } catch {
                return
            }
            
            lastTime = CMTimeAdd(lastTime, video.duration)
            count += 1
        }
    }
}

let mutableVideoComposition = AVMutableVideoComposition()
mutableVideoComposition.instructions = instructions
mutableVideoComposition.frameDuration = CMTimeMake(value: 1, timescale: 30)
mutableVideoComposition.renderSize = renderSize

dateFormatter.dateStyle = .long
dateFormatter.timeStyle = .short
let date = dateFormatter.string(from: date)

let mergedURL = NSURL.fileURL(withPath: NSTemporaryDirectory() + "merged-\(date)" + ".mp4")

guard let exporter = AVAssetExportSession(asset: videoComposition, presetName: AVAssetExportPresetHighestQuality) else { return }
exporter.outputURL = mergedURL
exporter.outputFileType = .mp4
exporter.videoComposition = mutableVideoComposition
exporter.shouldOptimizeForNetworkUse = true
completion(mergedURL, exporter)
}

func videoCompositionInstruction(_ firstTrack: AVAssetTrack, asset: AVAsset, count: Int) -> AVMutableVideoCompositionLayerInstruction {
let renderSize = CGSize(width: 720, height: 1280)

let instruction = AVMutableVideoCompositionLayerInstruction(assetTrack: firstTrack)

let assetTrack = asset.tracks(withMediaType: .video)[0]
let t = assetTrack.fixedPreferredTransform // new transform fix
let assetInfo = orientationFromTransform(t)

if assetInfo.isPortrait {

    let scaleToFitRatio = renderSize.width / assetTrack.naturalSize.height
    let scaleFactor = CGAffineTransform(scaleX: scaleToFitRatio, y: scaleToFitRatio)
    var finalTransform = assetTrack.fixedPreferredTransform.concatenating(scaleFactor)
    
    if assetInfo.orientation == .rightMirrored || assetInfo.orientation == .leftMirrored {
        finalTransform = finalTransform.translatedBy(x: -t.ty, y: 0)
    }
    instruction.setTransform(t, at: CMTime.zero)

} else {

    let renderRect = CGRect(x: 0, y: 0, width: renderSize.width, height: renderSize.height)
    let videoRect = CGRect(origin: .zero, size: assetTrack.naturalSize).applying(assetTrack.fixedPreferredTransform)

    let scale = renderRect.width / videoRect.width
    let transform = CGAffineTransform(scaleX: renderRect.width / videoRect.width, y: (videoRect.height * scale) / assetTrack.naturalSize.height)
    let translate = CGAffineTransform(translationX: .zero, y: ((renderSize.height - (videoRect.height * scale))) / 2)

    instruction.setTransform(assetTrack.fixedPreferredTransform.concatenating(transform).concatenating(translate), at: .zero)
}

if count == 0 {
    instruction.setOpacity(0.0, at: asset.duration)
}

return instruction
}

func orientationFromTransform(_ transform: CGAffineTransform) -> (orientation: UIImage.Orientation, isPortrait: Bool) {
var assetOrientation = UIImage.Orientation.up
var isPortrait = false

if transform.a == 0 && transform.b == 1.0 && transform.c == -1.0 && transform.d == 0 {
    assetOrientation = .right
    isPortrait = true
} else if transform.a == 0 && transform.b == 1.0 && transform.c == 1.0 && transform.d == 0 {
    assetOrientation = .rightMirrored
    isPortrait = true
} else if transform.a == 0 && transform.b == -1.0 && transform.c == 1.0 && transform.d == 0 {
    assetOrientation = .left
    isPortrait = true
} else if transform.a == 0 && transform.b == -1.0 && transform.c == -1.0 && transform.d == 0 {
    assetOrientation = .leftMirrored
    isPortrait = true
} else if transform.a == 1.0 && transform.b == 0 && transform.c == 0 && transform.d == 1.0 {
    assetOrientation = .up
} else if transform.a == -1.0 && transform.b == 0 && transform.c == 0 && transform.d == -1.0 {
    assetOrientation = .down
}
return (assetOrientation, isPortrait)
}

extension AVAssetTrack {

var fixedPreferredTransform: CGAffineTransform {
    var t = preferredTransform
    switch(t.a, t.b, t.c, t.d) {
    case (1, 0, 0, 1):
        t.tx = 0
        t.ty = 0
    case (1, 0, 0, -1):
        t.tx = 0
        t.ty = naturalSize.height
    case (-1, 0, 0, 1):
        t.tx = naturalSize.width
        t.ty = 0
    case (-1, 0, 0, -1):
        t.tx = naturalSize.width
        t.ty = naturalSize.height
    case (0, -1, 1, 0):
        t.tx = 0
        t.ty = naturalSize.width
    case (0, 1, -1, 0):
        t.tx = naturalSize.height
        t.ty = 0
    case (0, 1, 1, 0):
        t.tx = 0
        t.ty = 0
    case (0, -1, -1, 0):
        t.tx = naturalSize.height
        t.ty = naturalSize.width
    default:
        break
    }
    return t
}
}

score 0 · Answer 1 · answered Nov 25 '22 at 13:11

Assuming your transformations are correct, I updated your merge function.

The main change is using a single AVMutableVideoCompositionInstruction with multiple AVMutableVideoCompositionLayerInstruction, and passing the correct CMTime value to for the layer instruction to be executed at.

  func merge(videos: [AVURLAsset],
             for date: Date,
             completion: @escaping (_ url: URL, _ asset: AVAssetExportSession)->()) {
    let videoComposition = AVMutableComposition()
    
    guard let videoCompositionTrack = videoComposition.addMutableTrack(withMediaType: .video,
                                                                       preferredTrackID: Int32(kCMPersistentTrackID_Invalid)),
          let audioCompositionTrack = videoComposition.addMutableTrack(withMediaType: .audio,
                                                                       preferredTrackID: Int32(kCMPersistentTrackID_Invalid))
    else { return }
    
    
    var lastTime: CMTime = .zero
    var layerInstructions = [AVMutableVideoCompositionLayerInstruction]()
    
    for video in videos {
      guard let videoTrack = video.tracks(withMediaType: .video)[safe: 0] else { return }
      
      // add audio track if available
      if let audioTrack = video.tracks(withMediaType: .audio)[safe: 0] {
        do {
          try audioCompositionTrack.insertTimeRange(CMTimeRangeMake(start: .zero, duration: video.duration),
                                                    of: audioTrack,
                                                    at: lastTime)
        } catch {
          return
        }
      }
      
      // add video track
      do {
        try videoCompositionTrack.insertTimeRange(CMTimeRangeMake(start: .zero, duration: video.duration),
                                                  of: videoTrack,
                                                  at: lastTime)
        let layerInstruction = makeVideoCompositionInstruction(videoTrack,
                                                               asset: video,
                                                               atTime: lastTime)
        layerInstructions.append(layerInstruction)
      } catch {
        return
      }
      
      lastTime = CMTimeAdd(lastTime, video.duration)
    } // end for..in videos
    
    let renderSize = CGSize(width: 720, height: 1280)
    
    let videoInstruction = AVMutableVideoCompositionInstruction()
    videoInstruction.timeRange = CMTimeRangeMake(start: .zero, duration: lastTime)
    videoInstruction.layerInstructions = layerInstructions
    
    let mutableVideoComposition = AVMutableVideoComposition()
    mutableVideoComposition.instructions = [videoInstruction]
    mutableVideoComposition.frameDuration = CMTimeMake(value: 1, timescale: 30)
    mutableVideoComposition.renderSize = renderSize
    
    let dateFormatter = DateFormatter()
    dateFormatter.dateStyle = .long
    dateFormatter.timeStyle = .short
    
    let date = dateFormatter.string(from: date)
    
    let mergedURL = NSURL.fileURL(withPath: NSTemporaryDirectory() + "merged-\(date)" + ".mp4")
    
    guard let exporter = AVAssetExportSession(asset: videoComposition,
                                              presetName: AVAssetExportPresetHighestQuality) else { return }
    exporter.outputURL = mergedURL
    exporter.outputFileType = .mp4
    exporter.videoComposition = mutableVideoComposition
    exporter.shouldOptimizeForNetworkUse = true
    completion(mergedURL, exporter)
  }
  
  func makeVideoCompositionInstruction(_ videoTrack: AVAssetTrack,
                                       asset: AVAsset,
                                       atTime: CMTime) -> AVMutableVideoCompositionLayerInstruction {
    let renderSize = CGSize(width: 720, height: 1280)
    
    let instruction = AVMutableVideoCompositionLayerInstruction(assetTrack: videoTrack)
    
    let assetTrack = asset.tracks(withMediaType: .video)[0]
    let t = assetTrack.fixedPreferredTransform // new transform fix
    let assetInfo = orientationFromTransform(t)
    
    if assetInfo.isPortrait {
      let scaleToFitRatio = renderSize.width / assetTrack.naturalSize.height
      let scaleFactor = CGAffineTransform(scaleX: scaleToFitRatio, y: scaleToFitRatio)
      var finalTransform = assetTrack.fixedPreferredTransform.concatenating(scaleFactor)
      
      if assetInfo.orientation == .rightMirrored || assetInfo.orientation == .leftMirrored {
        finalTransform = finalTransform.translatedBy(x: -t.ty, y: 0)
      }
      instruction.setTransform(t, at: atTime)
      
    } else {
      
      let renderRect = CGRect(x: 0, y: 0, width: renderSize.width, height: renderSize.height)
      let videoRect = CGRect(origin: .zero, size: assetTrack.naturalSize).applying(assetTrack.fixedPreferredTransform)
      
      let scale = renderRect.width / videoRect.width
      let transform = CGAffineTransform(scaleX: renderRect.width / videoRect.width,
                                        y: (videoRect.height * scale) / assetTrack.naturalSize.height)
      let translate = CGAffineTransform(translationX: .zero,
                                        y: ((renderSize.height - (videoRect.height * scale))) / 2)
      
      instruction.setTransform(assetTrack.fixedPreferredTransform.concatenating(transform).concatenating(translate),
                               at: atTime)
    }
    
    // if atTime = 0, we can assume this is the first track being added
    if atTime == .zero {
      instruction.setOpacity(0.0,
                             at: asset.duration)
    }
    
    return instruction
  }

i tried the code but the orientation is still off...the video picture moreover also displays -90 degrees from portrait view :(. any additional thoughts? — Chris, Nov 26 '22 at 07:47
sorry for the delay! is there a way i can forward you the link directly? — Chris, Dec 13 '22 at 23:50

Merging video with original orientation

1 Answers1