1

I am building an experiment in which the video of a talking person is moving randomly within the screen. The user/player should follow the video and attentively listen to the speech, while an interfering audiotrack of a concurrent speaker is also played back.

For easier manipulation of the video positions and audio playback features, I chose to play the video and audio (both pertaining to the same original .mp4) as independent streams. Hence my major requirement is a precise synchronization between the video and audio stream, ideally to lip-sync precision - to simulate a perfectly synchronized video podcast. Note that I need the synchronization to remain robust over the entire playback time (up to 10-15 minutes).

After some documentation I reckoned that pyaudio and pygame provide a good framework to implement this.

Hence I proceed to extract the audio stream from the mp4 and play it together with the audio of the interfering speaker on 2 separate channels via the pyaudio module in callback mode (sampling frequency is 48kHz and buffer size of 256 samples). Pyaudio will play the stream in a new thread:

import pyaudio

class AudioPlayer(): 
    def __init__(self, cond_params)
        self.pa = pyaudio.PyAudio()
        self.audio_stream = self.pa.open(format=pyaudio.paFloat32,
                                        channels=2,
                                        rate=48000,
                                        start = False,
                                        output=True,
                                        stream_callback=self.callback,
                                        frames_per_buffer=256) playback_stream.start_stream()
    def play(self):      
         self.start_playback_pygame = pygame.time.get_ticks()/1000 # start of playback in pygame units (seconds)
         self.start_playback_audio = self.audio_stream.get_time()/1000 # start of playback in pyaudio units (seconds)
         self.audio_stream.start_stream()

         return self.start_playback_pygame
    
    def callback(self):
        [...code ommited...]

For the video playback, I'm using the pygame loop to read a new video frame via the open-cv module(cv2), compute the new random location and blit it to the screen on each game-loop update iteration.

import cv2, pygame

class VideoPlayer:
    def __init__(self, screen, cond_params):
        self.video_capture = cv2.VideoCapture('path_to_video')
        self.fps = self.video_capture.get(cv2.CAP_PROP_FPS)
        self.video_pos = (0,0) # initial position
        self.running = 0
        self.update_count = 0
        self.screen = screen

    def update(self, start_playback_time):
        """
        Update with the next video frame at a random new location. Called in every game frame, this will act like a loop, always reading the next video frame.
        """
        empty_screen(self.screen)
        self.update_count += 1
            
        # read the next video frame
        self.running, self.curr_frame = self.video_capture.read()
        if self.running:
            self.curr_frame = cv2.resize(self.curr_frame, self.video_dims)
            frame_to_blit = pygame.image.frombuffer(self.curr_frame.tobytes(), self.video_dims, "BGR")
            self.frame_count += 1
            self.compute_new_pos() # code ommited

            # update screen size if necessary
            self.screen_dims = self.screen.get_size() 
            self.screen.blit(frame_to_blit, self.video_pos)

The Media class handles both AudioPlayer() and VideoPlayer() at the same time. Once a button on the screen is pressed (code ommited), the experimental condition starts, i.e. the audio and video are played simultaneously. Every game loop iteration, an update() function will be called within the Media class, to load the next video frame. This is where I also handle the synchronization by computing the delay between the audio stream (handled by pyAudio in a different thread) and the video stream (handled by pygame in the current thread).

class Media:
    def __init__(self, cond_params, clock):
        """
        initialise the audio and video stimuli
        """

        self.clock = clock
        self.start_playback_time = None # start time measured by pygame clock once the AudioPlayer started the playback (in seconds)
        self.cond_params = cond_params
        self.AudioPlayer = AudioPlayer(self.cond_params)
        self.VideoPlayer = VideoPlayer(self.screen, cond_params) # init the display for the current condition
        self.delay_AV = None # keep track of the delay between video and audio streams
        self.screen = pygame.display.set_mode((700, 700), pygame.RESIZABLE, 16)
 
   def play(self, screen):
        """
        Start the condition. The audio and media are started simultaneously
        """
        # Start the auditory stimulus
        self.start_playback_time = self.AudioPlayer.play() # return start_play_back_time in pygame units
        return self.start_playback_time
 
   def update(self, events):
          """
          Update the GUI and activate or stop any processes if necessary
          :param events: contains all user inputs (clicks, key presses...)
          """

         self.update_count += 1
         if self.media.AudioPlayer.playback_active():
           self.media.VideoPlayer.update()

           """
            FOR SYNCHRONIZATION: query the current time for both the audio stream (pyaudio) and video stream (handled in pygame)
           """
           curr_time_audio = self.media.get_audio_stream_time()
           curr_time_pygame = pygame.time.get_ticks()/1000 - self.start_playback_time        
           self.delay_AV = curr_time_pygame - curr_time_audio

           # Update the display to show the new video frame at the correct location
           pygame.display.update()

          if self.delay_AV is not None: 
              if self.delay_AV < 0:
                  print("------ The Video stream is currently BEHIND (comes AFTER) the Audio stream by %.7f msec. Adjusting FPS... " %(self.delay_AV), flush = True)
              else:
                  print("------ The Video stream is currently AHEAD(comes BEFORE) the Audio stream by %.7f msec. Adjusting FPS... " %(self.delay_AV), flush = True)
              return self.delay_AV
          else:
              # no delay applied
              return int(0)

  def get_audio_stream_time(self):
      return self.AudioPlayer.audio_stream.get_time() - self.AudioPlayer.start_playback_audio

In the main game loop, I try to compensate for the AV (Audio-Visual) delay by adjusting the fps of the pygame loop, by using clock.tick_busy_loop():

import pygame, sys, Media

cond_params = json.load("path_to_params_file");
clock = pygame.time.Clock()
media = Media(cond_params, clock)
fps = 30 # matches the fps of the video file

# The main loop
running = 1 # True
while running: 
    # Check for user inputs
    events = pygame.event.get()
    for event in events:
        # Check whether the user quit the program
        if event.type == pygame.QUIT:
           running = 0 # False
           pygame.quit()
           sys.exit()

           # Check whether the user wishes to toggle the full-screen mode
           elif event.type == pygame.KEYDOWN:
               if event.key == pygame.K_ESCAPE:
                   pygame.display.toggle_fullscreen()
        
    # Update the media (mainly the video frame at new location, audio plays in a separate thread)
    delay_AV = media.update(events) 
    clock.tick_busy_loop( 1/(1/fps + delay_AV) )  # compensate for audio-video delay

Unfortunately this code doesn't achieve the precise sync I need. What is does is that after about 20-30 seconds of playback, the video stream starts to speed up, going off-sync with the audio. Maybe my understanding of how to compensate for the delay in the main game loop is not correct.


I also tried another approach in which I kept the fps constant (30 fps, matching the video's fps), while I compensated the delay right before calling pygame.display.update() inside the Media class, so that the next video frame is precisely displayed at the correct time. The code I used inside Media.update() is:

    [...]
    if self.delay_AV>0:
        """ 
        can only compensate the delay if current time of the video thread is past the current time of audio thread, 
in which case the current pygame thread waits some time until the the audio thread catches up with the current time of the pygame-video thread
        """
        delay_to_apply = int(round(np.abs(self.delay_AV))) 
        t1 = compute_current_time(self.start_playback_time, "msec")
        actual_pygame_delay_ms = pygame.time.delay(delay_to_apply)
        """
         Note: I had to round the delay to the nearest integer because pygame.time.delay() can only receive integers at input. Unfortunately some precision is lost
        """ 

        # Update display only after the "catch-up time" has passed
        pygame.display.update()

This second approach doesn't seem to work either - after 20-30 sec of well-synced playback, the AV streams start to diverge and the video goes gradually faster (by some ms) than the audio, making the non-sync noticeable.


As I've been spending more than 2 weeks trying to debug this, I look forward to some external enlightenment. Many thanks!

Final note: I'm running the code on Windows 10 with Python 3.10.1.

justius
  • 11
  • 2
  • Please trim your code to make it easier to find your problem. Follow these guidelines to create a [minimal reproducible example](https://stackoverflow.com/help/minimal-reproducible-example). – Community Feb 24 '22 at 22:33

0 Answers0