My first thought was to shell out to the ffmpeg command with something like this.
Creating a Video from Images
ffmpeg can be used to stitch several images together into a video.
There are many options, but the following example should be enough to
get started. It takes all images that have filenames of
XXXXX.morph.jpg, where X is numerical, and creates a video called
"output.mp4". The qscale option specifies the picture quality (1 is
the highest, and 32 is the lowest), and the "-r" option is used to
specify the number of frames per second.
ffmpeg -r 25 -qscale 2 -i %05d.morph.jpg output.mp4
(The website that this blurb was taken from is gone. Link
has been removed.)
Where 25 means 25 images per second. You could set this to 1 for a slight (1 sec) delay or use decimals, IE: 0.5 for a 2 second delay.
You can then combine a video and audio stream with something like this.
ffmpeg -i video.mp4 -i audio.mp3 -c:v copy -c:a aac -b:a 128k final.mp4
Of course choose your appropriate codecs. If you want an mp4 use libx264 for video and aac (built into ffmpeg and no longer "experimental") for audio.
Just remember that if you choose to use a method like this that ffmpeg output goes, by default, to stderr for when you try to read it. It can be redirected to stdout if you prefer.