More precision from ffmpeg silencedetect

Question

I am trying to split a very large (70 hours) mp3 file into smaller files. My first step is the get the timestamps using the silencedetect command in ffmpeg. It works fine for the first few timestamps, but unfortunately, the results are rounded to six significant digits.

The code I am executing is:

ffmpeg -i input.mp3 -af silencedetect=d=3 -hide_banner -nostats -f null -

My results are:

Input #0, mp3, from 'input.mp3':
  Duration: 70:46:05.32, start: 0.050113, bitrate: 64 kb/s
    Stream #0:0: Audio: mp3, 22050 Hz, stereo, fltp, 64 kb/s
Stream mapping:
  Stream #0:0 -> #0:0 (mp3 (mp3float) -> pcm_s16le (native))
Press [q] to stop, [?] for help
Output #0, null, to 'pipe:':
  Metadata:
    encoder         : Lavf58.29.100
    Stream #0:0: Audio: pcm_s16le, 22050 Hz, stereo, s16, 705 kb/s
    Metadata:
      encoder         : Lavc58.54.100 pcm_s16le
[silencedetect @ 0x5590d08bd700] silence_start: 10.6895
[silencedetect @ 0x5590d08bd700] silence_end: 15.0054 | silence_duration: 4.31587
[silencedetect @ 0x5590d08bd700] silence_start: 446.958
[silencedetect @ 0x5590d08bd700] silence_end: 450.959 | silence_duration: 4.00168
[silencedetect @ 0x5590d08bd700] silence_start: 1168.17
[silencedetect @ 0x5590d08bd700] silence_end: 1172.17 | silence_duration: 4.00694
[silencedetect @ 0x5590d08bd700] silence_start: 1880.8
[silencedetect @ 0x5590d08bd700] silence_end: 1884.8 | silence_duration: 3.99265

...

[silencedetect @ 0x5590d08bd700] silence_start: 123108
[silencedetect @ 0x5590d08bd700] silence_end: 123111 | silence_duration: 3.61946
[silencedetect @ 0x5590d08bd700] silence_start: 123286
[silencedetect @ 0x5590d08bd700] silence_end: 123290 | silence_duration: 4.01646
[silencedetect @ 0x5590d08bd700] silence_start: 124229
[silencedetect @ 0x5590d08bd700] silence_end: 124233 | silence_duration: 4.01846
[silencedetect @ 0x5590d08bd700] silence_start: 124442
[silencedetect @ 0x5590d08bd700] silence_end: 124446 | silence_duration: 4.0298

...

Rounding to the nearest second is not sufficient for my purposes. Ideally, I would like each timestamp to be accurate to the hundredth of a second or something similar. Does anybody know a way to achieve this?

kesh · Accepted Answer · 2022-11-13T19:36:58.693

1

Append ametadata=print:file=- to the filterchain and parse stdout in your program. It provides the frame time in seconds, frames, and pts. Grab the time_base from ffprobe and you can compute accurate time.

If you're using Python, you can try the following with my ffmpegio package:

from ffmpegio import analyze as ffa, probe as ffp
from pprint import pprint

input = "BigBuckBunny.mp4"
tb = next(info for info in ffp.streams_basic(input) 
          if info["codec_type"] == "audio")["time_base"]
print(f'time_base = {tb} s')

# analyze first 5 minutes and return silent intervals in the first 5 minutes
(logger,) = ffa.run(input, ffa.SilenceDetect(d=1), time_units="pts", to=60 * 5)

pprint([(pts0 * tb, pts1 * tb) for pts0, pts1 in logger.output.interval])

returns the silent intervals in fractions

time_base = 1/44100 s
[(Fraction(947456, 11025), Fraction(958976, 11025)),
 (Fraction(976384, 11025), Fraction(39680, 441)),
 (Fraction(1018624, 11025), Fraction(146176, 1575))]

edited Nov 13 '22 at 19:36

answered Nov 13 '22 at 19:23

kesh

4,515
2
12
20

Thank you! I was able to get what I needed using your package. I should point out though that the code you supplied here had two issues with it. First, the time base I got was incorrect. The mp3 file I'm using has a sample rate of 22.05 khz, but your function gave me tb=1/14112000. I hardcoded tb=1/22050 to fix that. Secondly, even with the corrected time base, my timestamps were off by exactly the silence duration. I'm using d=3, so all my timestamps were 3 seconds ahead. This was an easy fix though. Just wanted to let you know in case there's a bug somewhere in your repo. – nerfherder616 Nov 14 '22 at 03:57
Interesting... Will look into the timebase value. The values in `logger.output` is exactly what FFmpeg returns so the timestamp offset issue is likely a bug in FFmpeg... – kesh Nov 14 '22 at 04:17

score 0 · Answer 2 · answered Nov 13 '22 at 07:07

0

Unfortunately, this is hardcoded in FFmpeg:

static inline char *av_ts_make_time_string(char *buf, int64_t ts, AVRational *tb)
{
    if (ts == AV_NOPTS_VALUE) snprintf(buf, AV_TS_MAX_STRING_SIZE, "NOPTS");
    else                      snprintf(buf, AV_TS_MAX_STRING_SIZE, "%.6g", av_q2d(*tb) * ts);
    return buf;
}

The relevant part is the %.6g... this is setting the formatting.

You'll have to submit a patch to get it changed.

answered Nov 13 '22 at 07:07

Brad

159,648
54
349
530

That won't happen. However, the usual way is to add func2() with an adjustable parameter and to call func2() within func() with the old hardcoded value. – Gyan Nov 13 '22 at 10:39

More precision from ffmpeg silencedetect

2 Answers2