1

I'm currently researching an problem regarding DOA (direction of arrival) regression for an audio source, and need to generate training data in the form of audio signals of moving sound sources. In particular, I have the stationary sound files, and I need to simulate a source and microphone(s) with the distances between them changing to reflect movement.

Is there any software online that could potentially do the trick? I've looked into pyroomacoustics and VA as well as other potential libraries, but none of them seem to deal with moving audio sources, due to the difficulties in simulating the doppler effect.

If I were to write up my own simulation code for dealing with this, how difficult would it be? My use case would be an audio source and a microphone in some 2D landscape, both moving with their own velocities, where I would want to collect the recording from the microphone as an audio file.

Jihan Yin
  • 97
  • 6

3 Answers3

0

Some speculation here on my part, as I have only dabbled with writing some aspects of what you are asking about and am not experienced with any particular libraries. Likelihood is good that something exists and will turn up.

That said, I wonder if it would be possible to use either the Unreal or Unity game engine. Both, as far as I can remember, grant the ability to load your own cues and support 3D including Doppler.

As far as writing your own, a lot depends on what you already know. With a single-point mike (as opposed to stereo) the pitch shifting involved is not that hard. There is a technique that involves stepping through the audio file's DSP data using linear interpolation for steps that lie in between the data points, which is considered to have sufficient fidelity for most purposes. Lot's of trig, too, to track the changes in velocity.

If we are dealing with stereo, though, it does get more complicated, depending on how far you want to go with it. The head masks high frequencies, so real time filtering would be needed. Also it would be good to implement delay to match the different arrival times at each ear. And if you start talking about pinnas, I'm way out of my league.

Phil Freihofner
  • 7,645
  • 1
  • 20
  • 41
0

As of now it seems like Pyroomacoustics does not support moving sound sources. However, do check a possible workaround suggested by the developers here in Issue #105 - where the idea of using a time-varying convolution on a dense microphone array is suggested.

Thejasvi
  • 200
  • 1
  • 11
0

Recently I had to do same: synthesize recordings at specific positions in space with a moving sound source. I was able to simulate the system in Python using NumPy (and a tiny bit of SciPy which can easily be done in NumPy if you wish).

In my case, the sound source moves with constant velocity in a straight line. The basic concept of the simulation is that for each sample of the original recording, I calculate how much time that sample / part of the sound wave would take to reach each of the microphones. Taking into account both the speed of sound and velocity of the sound source, this should account for both the Doppler effect and the delay in time for the sound reaching each microphone.

When we have arrays describing the arrival times for each sample of the original recording at each of the synthetic microphone locations, we must resample the original recording at a fixed sample rate, which we can do with NumPy's interp function.

The code looks something like this:

# Read the original recording and create an array with the time of each sample
sample, sample_rate, sample_duration = read_wav(my_file)
sample_length = int(sample_duration * sample_rate)
sample_time = np.linspace(sample_start_secs, sample_end_secs, sample_length)

# Configure our soundscape
speed_of_sound = 343
source_start = np.asarray([1, 0.5, 0])
# We could model this with an acceleration if we wanted to
velocity = np.broadcast_to([[-3, -4, 0]], (sample_length, 3))
position = source_start + velocity * (sample_time - sample_time[0])[:, np.newaxis]
mic_locs = np.asarray([
    [0, 3, 0],
    [3, 0, 0],
    [0, -3, 0],
    [-3, 0, 0]
])
num_mics = mic_locs.shape[0]

# Calculate the arrival times for each microphone
unit_vectors = mic_locs - position[:, np.newaxis, :]
unit_vectors /= np.linalg.norm(unit_vectors, axis=2, keepdims=True)
# This einsum is basically a dot product between the velocity and the unit vectors
# for each point in time
approach_speeds = np.einsum('ij,ikj->ik', velocity, unit_vectors)
# cdist is from scipy.spatial.distance, but could easily be implemented in NumPy
distances = cdist(position, mic_locs)
arrival_times = sample_time[:, np.newaxis] + distances / (approach_speeds + speed_of_sound)

# Resample the original recording to get the synthetic recordings
recordings_start = arrival_times[0, :].max()
recordings_end = arrival_times[-1, :].min()
recording_length = recordings_end - recordings_start
recording_samples = int(recording_length * sample_rate)
recordings = np.zeros((num_mics, recording_samples))
recordings_time = np.linspace(recordings_start, recordings_end, recording_samples)
for i in range(num_mics):
    recordings[i, :] = np.interp(recordings_time, arrival_times[:, i], sample, left=0, right=0)
totokaka
  • 2,244
  • 1
  • 21
  • 33