This is one of the those responses that may not be enough for someone who doesn't know what I am talking about and too much for someone who does. Here goes:
It sounds to me as if you need to build an Action with the Actions SDK rather than with Dialog flow. Then you implement a text "intent" in your Action - i.e. one that runs every time the user speaks something. In that text intent you ask the AoG platform for the text - see getRawInput(). Now you do two things. One, you take that raw input and pass it to your bot. Two, you return a promise to tell AoG that you are working on a reply but you don't have it yet. Once the promise is fulfilled - i.e. when your bot replies - you reply with the text you got from your bot.
I have a sample Action called the French Parrot here https://github.com/unclewill/french_parrot. As far as speech goes it simply speaks back whatever it hears as a parrot would. It also goes to a translation service to translate the text and return the (loose) French equivalent.
Your mission, should you choose to accept it, is to take the sample, rip out the code that goes to the translation service and insert the code that goes to your bot. :-)
Two things I should mention. One, it is not "idiomatic" Node or JavaScript you'll find in my sample. What can I say - I think the rest of the world is confused. Really. Two, I have a minimal sample of about 50 lines that eschews the translation here https://github.com/unclewill/parrot. Another option is to use that as a base and add code to call your bot and the Promise-y code to wait on it to it.
If you go the latter route remove the trigger phrases from the action package (action.json).