0

The SSML prosody element can take a value representing a relative change, which may be a percentage value (e.g. +50% or -30%).

What should that be a percentage of? Is it the Hz value of the current pitch (so an octave interval (i.e. +12st) is the same as +100%)? Or is it related to something else, such as the range between x-low and x-high (so x-low +50% is the same as medium, then another +50% is x-high)? Is it simply left up to the implementers to decide?

I understand that SSML is not a system for marking up music, and that this represents the "baseline pitch" or the utterance, rather than the exact pitch at which the whole utterance is to be delivered. I just wish to know whether certain expressions can be considered equivalent.

Paul Butcher
  • 6,902
  • 28
  • 39

1 Answers1

0

Yes, my understanding is that the percentage is based on the current pitch so -50% for an octave down and +100% for an octave up.

The ratio for each semitone is calculated as a power of the 12th root of 2. So the first semitone above is a ratio of 1.0595 or a percent change of 5.95%, the second is 1.0595^2 which results in a percent change of 12.25%, etc. The first semitone below is -5.61% because it decreases as the inverse of the 12th root of 2.

In general the relative percent change for each semitone is computed as ((2^(1/12))^n) - 1) * 100 or approximately ((1.0595^n) - 1) * 100 for integer n.

user650881
  • 2,214
  • 19
  • 31