So I am cleaning up code in my code base, I have found one notable bottleneck.
/**
* Gets an InputStream to MP3Data for the returned information from a request
* @param synthText List of Strings you want to be synthesized into MP3 data
* @return Returns an input stream of all the MP3 data that is returned from Google
* @throws IOException Throws exception if it cannot complete the request
*/
public InputStream getMP3Data(List<String> synthText) throws IOException{
//Uses an executor service pool for concurrency. Limit to 1000 threads max.
ExecutorService pool = Executors.newFixedThreadPool(1000);
//Stores the Future (Data that will be returned in the future)
Set<Future<InputStream>> set = new LinkedHashSet<Future<InputStream>>(synthText.size());
for(String part: synthText){ //Iterates through the list
Callable<InputStream> callable = new MP3DataFetcher(part);//Creates Callable
Future<InputStream> future = pool.submit(callable);//Begins to run Callable
set.add(future);//Adds the response that will be returned to a set.
}
List<InputStream> inputStreams = new ArrayList<InputStream>(set.size());
for(Future<InputStream> future: set){
try {
inputStreams.add(future.get());//Gets the returned data from the future.
} catch (ExecutionException e) {//Thrown if the MP3DataFetcher encountered an error.
Throwable ex = e.getCause();
if(ex instanceof IOException){
throw (IOException)ex;//Downcasts and rethrows it.
}
} catch (InterruptedException e){//Will probably never be called, but just in case...
Thread.currentThread().interrupt();//Interrupts the thread since something went wrong.
}
}
return new SequenceInputStream(Collections.enumeration(inputStreams));//Sequences the stream.
}
This method is quite simple. It just gets a bunch of inputstreams corresponding to different MP3s from the internet and sequences them together. Unfortunately, there is significantly latency in doing so, 250ms or so and this causes some issues with sequences a larger number of MP3s. The blocking call in this is get() of course which requires each thread to connect to the server and then begin reading in stuff to the local machine. This is fine and all, but then this immediately creates a massive bandwidth bottleneck where a massive amount of data is being downloaded just so SequenceInputStream can sequence them. Is there a way that I can have a class like SequenceInputStream evaluate lazily? Is there a library that does this automatically? Any help would be appreciated.
Also if you haven't realized, these audio files are generated dynamically hence the latency. There are Text to Speech audio files.