Do you want to write everything yourself or will you have a framework like VXML to work with? If you just want to write the VXML and JSP files then you need to have a VXML browser. If you want to write everything completely yourself then making a VXML browser is probably overkill and, regardless of whether you make a VXML browser or something else, you will need to worry about abstracting the hardware - an IVR with one voice/fax/modem will need different low level code than an IVR with Dialogic cards connected to T1 lines and that would be different than one that handles just SIP calls.
Assuming you have a VXML browser already and you just need to provide the VXML and JSP files, then what you need to worry about is whether you just want call flow or if you are going to do back-end integration. If your IVR is just going to answer the call, ask for some input from the caller and then play more info and hang up or transfer then it gets really easy - you don't need Java at all. The Java is needed for the back end integration.
Assuming you are going to have back end integration - whether it is just a database or web services to another server you need to worry about doing the back end calls asynchronously - if callers hear more than a second of dead air without being warned they will think the IVR is not working and will hang up. So, when the call arrives you need to send your initial request for data, then say "Welcome to my IVR" and then attempt to retrieve the result. If the result is not yet returned you need to say something else like "Please wait while I retrieve your details" and then check again. Eventually if the request doesn't return you need a fallback plan - you can either say "That service isn't currently available" and then transfer or hang up or you could offer a reduced service IVR. Whatever you do, you don't want the customer to ever hear more than a second of silence unless you have specifically told them you are waiting for something - either waiting for them to give input or waiting for their account details (or something similar).
To have this kind of asynchronous experience with VXML and JSPs you will need an in-memory queue of requests and a execution service that can provide worker threads to service those requests. That way you can queue a request and continue the IVR call flow checking periodically for a result. The execution service will eventually process the request and update it with the result. Then, when the IVR checks and the request is available it can use that info. But if the result doesn't come back back in time the IVR will give up and stop checking so you need a static thread that scans the queue and after a certain length of time will cancel the request if the execution service is processing it and then delete the request from the queue.
A VXML brower queues the voice and doesn't wait for it to be actually played until caller input is retrieved so if you are using voice to stall while you retrieve data then the voice prompt will need to be attached to a a grammar that doesn't accept any valid input just so that the IVR knows when the voice is finished. If you absolutely need the result of the back-end request before continuing the call flow you will need to loop around checking for the result until it either arrives or a smallish timeout has elapsed (how long depends on whether you warned them it could take a while or not). The same thing applies in this case - you will need to play a small silence attached to a grammar so that the call flow waits before checking again for the result - there normally isn't much point checking more often than 100ms-200ms.
If you aren't going to use a VXML browser but instead will write something yourself then the same advice mostly applies. But if you are going to have back end integration I would recommend making the system always wait for the voice prompt to finish playing instead of just queueing it - it makes everything MUCH easier. You will still need the in-memory queue and an execution pool so that the back-end integration can be done in the background.