I'm assuming your program is processing two audio streams, and each stream is providing you with a series of audio buffers.
If so, then the number of frames of audio in each buffer isn't a fundamental characteristic of the audio, rather it is just a side effect of how the audio samples were packaged together (e.g. the producer of stream A decided to put 1000 samples together into a single buffer, while the producer of stream B decided to put just 600 samples together).
Ideally you could tell both of your stream-producers to give you audio buffers with a fixed (and equal) number of frames in them, so that you could just add the samples together verbatim, but if you can't get them to do that, then you'll need to implement some kind of buffering mechanism, where you hold the "extra" frames from the larger of the two buffers in some kind of FIFO queue and then use them as the first samples in your next mixing operation. That can get a little bit complicated, so unless performance is your primary concern, I suggest just keeping a FIFO queue of audio frames (e.g. a std::dequeue<float>
or similar) for each input-stream, and always pushing all of the newly-received audio frames from that input-streams to the tail of that FIFO queue, and then popping frames from the head of each FIFO queue as necessary when you need to mix audio together. That way you decouple the mixing of audio from the size of the input audio buffers, so that your mixing code will work regardless of what the input streams produce for you. (note that the maximum size of the output/mixed audio-buffer than you can produce will be equal to the number of audio frames in your shortest FIFO queue at that time)
Handling different sample rates is a more difficult problem to solve, especially if you want your output audio to have decent sound quality. To handle it properly, you'll need to use a sample-rate-converter algorithm (such as libsamplerate) to convert one of the streams' sample-rate to be equal to the sample-rate of the other one (or if you prefer, to convert both streams' sample-rate to be equal to the sample-rate of your output stream). Once you've done that, then you can add the two matched-rate streams together sample-by-sample, as before.