If you want the most precise timing, then rendering the audio before playback would allow you to play back just one audio file and all of the audio would be at the correct time. You would also be able to have the user play notes in real time with little to no delay since you're only playing one other audio track. You can do most of this rendering on background threads as the user makes changes so the main thread is not blocked by all of this processing.
The downsides to this pre-rendered audio include dealing with rendering time (which could be just a fraction of a second or a full minute with complex audio on an older device), memory management, and complexity of code. This will generate the best results though.
If you're going for manipulating notes on the fly, I would recommend taking events as they come. As the user makes a change, play the new audio file. This should be relatively trivial to implement.
If you're trying to have some sort of MIDI sequencer, then I'd highly recommend pre-rendering audio. It does require a fair amount of processing power and the programming can be difficult, but the results are much, much better for the user.