1

I. Background

  1. I am trying to make an application which helps to match subtitles to the audio waveform very accurately at the waveform level, at the word level or even at the character level.
  2. The audio is expected to be Sanskrit chants (Yoga, rituals etc.) which are extremely long compound words [ example - aṅganyā-sokta-mātaro-bījam is traditionally one word broken only to assist reading ]
  3. The input transcripts / subtitles might be roughly in sync at the sentence/verse level but surely would not be in sync at the word level.
  4. The application should be able to figure out points of silence in the audio waveform, so that it can guess the start and end points of each word (or even letter/consonant/vowel in a word), such that the audio-chanting and visual-subtitle at the word level (or even at letter/consonant/vowel level) perfectly match, and the corresponding UI just highlights or animates the exact word (or even letter) in the subtitle line which is being chanted at that moment, and also show that word (or even the letter/consonant/vowel) in bigger font. This app's purpose is to assist learning Sanskrit chanting.
  5. It is not expected to be a 100% automated process, nor 100% manual but a mix where the application should assist the human as much as possible.

II. Following is the first code I wrote for this purpose, wherein

  1. First I open a mp3 (or any audio format) file,
  2. Seek to some arbitrary point in the timeline of the audio file // as of now playing from zero offset
  3. Get the audio data in raw format for 2 purposes - (1) playing it and (2) drawing the waveform.
  4. Playing the raw audio data using standard java audio libraries

III. The problem I am facing is, between every cycle there is screeching sound.

  • Probably I need to close the line between cycles ? Sounds simple, I can try.
  • But I am also wondering if this overall approach itself is correct? Any tip, guide, suggestion, link would be really helpful.
  • Also I just hard coded the sample-rate etc ( 44100Hz etc. ), are these good to set as default presets or it should depend on the input format ?

IV. Here is the code

import com.github.kokorin.jaffree.StreamType;
import com.github.kokorin.jaffree.ffmpeg.FFmpeg;
import com.github.kokorin.jaffree.ffmpeg.FFmpegProgress;
import com.github.kokorin.jaffree.ffmpeg.FFmpegResult;
import com.github.kokorin.jaffree.ffmpeg.NullOutput;
import com.github.kokorin.jaffree.ffmpeg.PipeOutput;
import com.github.kokorin.jaffree.ffmpeg.ProgressListener;
import com.github.kokorin.jaffree.ffprobe.Stream;
import com.github.kokorin.jaffree.ffmpeg.UrlInput;
import com.github.kokorin.jaffree.ffprobe.FFprobe;
import com.github.kokorin.jaffree.ffprobe.FFprobeResult;
import java.io.IOException;
import java.io.OutputStream;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicLong;
import javax.sound.sampled.AudioFormat;
import javax.sound.sampled.AudioSystem;
import javax.sound.sampled.DataLine;
import javax.sound.sampled.SourceDataLine;


public class FFMpegToRaw {
    Path BIN = Paths.get("f:\\utilities\\ffmpeg-20190413-0ad0533-win64-static\\bin");
    String VIDEO_MP4 = "f:\\org\\TEMPLE\\DeviMahatmyamRecitationAudio\\03_01_Devi Kavacham.mp3";
    FFprobe ffprobe;
    FFmpeg ffmpeg;

    public void basicCheck() throws Exception {
        if (BIN != null) {
            ffprobe = FFprobe.atPath(BIN);
        } else {
            ffprobe = FFprobe.atPath();
        }
        FFprobeResult result = ffprobe
                .setShowStreams(true)
                .setInput(VIDEO_MP4)
                .execute();

        for (Stream stream : result.getStreams()) {
            System.out.println("Stream " + stream.getIndex()
                    + " type " + stream.getCodecType()
                    + " duration " + stream.getDuration(TimeUnit.SECONDS));
        }    
        if (BIN != null) {
            ffmpeg = FFmpeg.atPath(BIN);
        } else {
            ffmpeg = FFmpeg.atPath();
        }

        //Sometimes ffprobe can't show exact duration, use ffmpeg trancoding to NULL output to get it
        final AtomicLong durationMillis = new AtomicLong();
        FFmpegResult fFmpegResult = ffmpeg
                .addInput(
                        UrlInput.fromUrl(VIDEO_MP4)
                )
                .addOutput(new NullOutput())
                .setProgressListener(new ProgressListener() {
                    @Override
                    public void onProgress(FFmpegProgress progress) {
                        durationMillis.set(progress.getTimeMillis());
                    }
                })
                .execute();
        System.out.println("audio size - "+fFmpegResult.getAudioSize());
        System.out.println("Exact duration: " + durationMillis.get() + " milliseconds");
    }

    public void toRawAndPlay() throws Exception {
        ProgressListener listener = new ProgressListener() {
            @Override
            public void onProgress(FFmpegProgress progress) {
                System.out.println(progress.getFrame());
            }
        };

        // code derived from : https://stackoverflow.com/questions/32873596/play-raw-pcm-audio-received-in-udp-packets

        int sampleRate = 44100;//24000;//Hz
        int sampleSize = 16;//Bits
        int channels   = 1;
        boolean signed = true;
        boolean bigEnd = false;
        String format  = "s16be"; //"f32le"

        //https://trac.ffmpeg.org/wiki/audio types
        final AudioFormat af = new AudioFormat(sampleRate, sampleSize, channels, signed, bigEnd);
        final DataLine.Info info = new DataLine.Info(SourceDataLine.class, af);
        final SourceDataLine line = (SourceDataLine) AudioSystem.getLine(info);

        line.open(af, 4096); // format , buffer size
        line.start();

        OutputStream destination = new OutputStream() {
            @Override public void write(int b) throws IOException {
                throw new UnsupportedOperationException("Nobody uses thi.");
            }
            @Override public void write(byte[] b, int off, int len) throws IOException {
                String o = new String(b);
                boolean showString = false;
                System.out.println("New output ("+ len
                        + ", off="+off + ") -> "+(showString?o:"")); 
                // output wave form repeatedly

                if(len%2!=0) {
                    len -= 1;
                    System.out.println("");
                }
                line.write(b, off, len);
                System.out.println("done round");
            }
        };

        // src : http://blog.wudilabs.org/entry/c3d357ed/?lang=en-US
        FFmpegResult result = FFmpeg.atPath(BIN).
            addInput(UrlInput.fromPath(Paths.get(VIDEO_MP4))).
            addOutput(PipeOutput.pumpTo(destination).
                disableStream(StreamType.VIDEO). //.addArgument("-vn")
                setFrameRate(sampleRate).            //.addArguments("-ar", sampleRate)
                addArguments("-ac", "1").
                setFormat(format)              //.addArguments("-f", format)
            ).
            setProgressListener(listener).
            execute();

        // shut down audio
        line.drain();
        line.stop();
        line.close();

        System.out.println("result = "+result.toString());
    }

    public static void main(String[] args) throws Exception {
        FFMpegToRaw raw = new FFMpegToRaw();
        raw.basicCheck();
        raw.toRawAndPlay();
    }
}

Thank You

  • 1
    If you're on macOS or Windows, you might want to consider using https://www.tagtraum.com/ffsampledsp/ to make this a lot more elegant. – Hendrik Apr 27 '20 at 14:11
  • @Hendrik - any link to any sample code? That would help. Thank you for your comment. – Sri Nithya Sharabheshwarananda Apr 27 '20 at 14:14
  • 1
    simplify by reading in a file with a known audio frequency say 100 Hertz and confirm your code works by printing out the raw audio curve in PCM format ( just points on the audio curve ) so you can see the audio curve data points are varying up / down as per a sin curve ... this will let you confirm your code is solid – Scott Stensland Apr 27 '20 at 14:33
  • @ScottStensland - Thank you for your comment. I can hear the audio alight, it plays ok, and then screeching sound, then next loop, plays ok, then screeching sound. Still figuring out the issues. – Sri Nithya Sharabheshwarananda Apr 27 '20 at 14:37

1 Answers1

2

I suspect your screech sound stems from a half-filled buffer that is handed to the audio system.

As indicated in the comment above, I'd use something like FFSampledSP (if on mac or Windows) and then code like the following, which is much more java-esque.

Just make sure the FFSampledSP complete jar is in your path and you should be good to go.

import javax.sound.sampled.*;
import java.io.File;
import java.io.IOException;

public class PlayerDemo {

    /**
     * Derive a PCM format.
     */
    private static AudioFormat toSignedPCM(final AudioFormat format) {
        final int sampleSizeInBits = format.getSampleSizeInBits() <= 0 ? 16 : format.getSampleSizeInBits();
        final int channels = format.getChannels() <= 0 ? 2 : format.getChannels();
        final float sampleRate = format.getSampleRate() <= 0 ? 44100f : format.getSampleRate();
        return new AudioFormat(AudioFormat.Encoding.PCM_SIGNED,
                sampleRate,
                sampleSizeInBits,
                channels,
                (sampleSizeInBits > 0 && channels > 0) ? (sampleSizeInBits/8)*channels : AudioSystem.NOT_SPECIFIED,
                sampleRate,
                format.isBigEndian()
        );
    }


    public static void main(final String[] args) throws IOException, UnsupportedAudioFileException, LineUnavailableException {
        final File audioFile = new File(args[0]);
        // open mp3 or whatever
        final Long durationInMicroseconds = (Long)AudioSystem.getAudioFileFormat(audioFile).getProperty("duration");
        // how long is the file, use AudioFileFormat properties
        System.out.println("Duration in microseconds (not millis!): " + durationInMicroseconds);
        // open the mp3 stream (not yet decoded)
        final AudioInputStream mp3In = AudioSystem.getAudioInputStream(audioFile);
        // derive a suitable PCM format that can be played by the AudioSystem
        final AudioFormat desiredFormat = toSignedPCM(mp3In.getFormat());
        // ask the AudioSystem for a source line for playback
        // that corresponds to the derived PCM format
        final SourceDataLine line = AudioSystem.getSourceDataLine(desiredFormat);

        // now play, typically in separate thread
        new Thread(() -> {
            final byte[] buf = new byte[4096];
            int justRead;
            // convert to raw PCM samples with the AudioSystem
            try (final AudioInputStream rawIn = AudioSystem.getAudioInputStream(desiredFormat, mp3In)) {
                line.open();
                line.start();
                while ((justRead = rawIn.read(buf)) >= 0) {
                    // only write bytes we really read, not more!
                    line.write(buf, 0, justRead);
                    final long microsecondPosition = line.getMicrosecondPosition();
                    System.out.println("Current position in microseconds: " + microsecondPosition);
                }
            } catch (IOException | LineUnavailableException e) {
                e.printStackTrace();
            } finally {
                line.drain();
                line.stop();
            }
        }).start();
    }
}

The regular Java API does not allow to jump to arbitrary positions. However, FFSampledSP contains an extension, i.e. a seek() method. To use it, just cast the rawIn from the example above to FFAudioInputStream and call seek() with a time and a timeUnit.

Hendrik
  • 5,085
  • 24
  • 56
  • Thank you, let me try this out. I will get back. – Sri Nithya Sharabheshwarananda Apr 27 '20 at 14:42
  • It is playing fine, now let me try to understand the code. – Sri Nithya Sharabheshwarananda Apr 27 '20 at 14:46
  • How do I seek and play the audio from arbitrary time points? As I explained, I need to sync, audio and subtitles very accurately, so the user should be able to do this quiet accurately. I leave the accurate seeking out of the scope of current question, but how does somebody do even basic seeking? That's important. Thank you. Aside : Now I may need to graphically the audio data to see the silence point or may be can just do it with calculation. That would be different topic, not this question. – Sri Nithya Sharabheshwarananda Apr 27 '20 at 14:53
  • Looks good, this answer is good. You might want to make the changes related to seeking in the sample code itself for the benefit of others who come to see this. Thank you. I will definitely use this for development, but maybe not for final thing. Because I intend to use Javafx and GraalVm and make it run on mobile devices also, like android, iOS. For my personal use, while I am experimenting, this is very very good. However most end-users do not have any device other than a mobile. Thank you again. – Sri Nithya Sharabheshwarananda Apr 27 '20 at 15:13
  • You can use the JavaFX [MediaPlayer](https://docs.oracle.com/javafx/2/api/javafx/scene/media/MediaPlayer.html) also from the desktop. It supports mp3 out of the box and has a `seek()` method. Using it may be the simpler solution right away. – Hendrik Apr 27 '20 at 15:38
  • I need the waveform, I don't know if I can get that in mediaplayer, I think it will just play it. Additionally I want to adjust the playback speed. (which I think I can get frm this https://github.com/waywardgeek/sonic/blob/master/Sonic.java ) Same Sanskrit chanting, can take 8hrs or 2hrs, it is that much difference ! Yes some recitals are 7-8 hrs long, and people remember it by heart. Other thing is, the visual of waveform and the playback have to be somewhat in sync., all this small small things I don't think JavaFX MediaPlayer will be able to help in. I might be wrong. – Sri Nithya Sharabheshwarananda Apr 27 '20 at 15:53
  • 1
    No, you are absolutely right. You won't get the waveform with MediaPlayer. – Hendrik Apr 27 '20 at 15:54
  • How do i plot this visually, should i post another question? I mean I can figure out the graphics, I was playing with the byte[] to understand how it represents a wave. – Sri Nithya Sharabheshwarananda Apr 27 '20 at 15:56
  • 1
    You need to convert the bytes to samples, i.e., ints or floats and then plot those. For sample code for the conversion see https://github.com/hendriks73/jipes/blob/master/src/main/java/com/tagtraum/jipes/audio/AudioSignalSource.java#L186 If that does not help, please post a new question. – Hendrik Apr 27 '20 at 15:58