The Assistant's Media response is the nearest parallel to Alexa's AudioPlayer, although there are clearly differences between the two:
Alexa's playback is done outside the context of a session / conversation. So once you start a playback, you only have the playback controls available. Assistant Media controls are part of a conversation, so you can fully handle anything the user might say.
One consequence of this is that Alexa treats the playback as the result of the skill, while the Assistant treats it as part of the Action.
Google only sends an event when media playback has finished, and doesn't give any indication of why it has finished. Alexa reports more of the controls and has more events describing the state of the playback.
This makes it fairly easy to "queue up" the next audio for Alexa, but that brings additional complexity for how to handle when the queue ends up being wrong at the last moment. The Assistant doesn't have any way to queue the next audio, so there ends up being a gap between the audio ending and the next beginning while the event is handled on the server.
Although the approaches are slightly different, both seem to offer a basic long-audio playback service.
This doesn't sound like what you are trying to do, but if you are looking for something slightly more static, you can also look at the content actions that Google supports.