Asynchronous multiple query from different datasources or databases

Question

I'm having trouble to find appropriate solution for that:

I have several databases with the same structure but with different data. And when my web app execute a query, it must separate this query for each database and execute it asynchronously and then aggregate results from all databases and return it as single result. Additionaly I want to be able to pass a list of databases where query would be executed and also I want to pass maximum expiration time for query executing. Also result must contains meta information for each databases such as excess execution time.

It would be great if it possible to use another datasource such as remote web service with specific API, rather than relational database.

I use Spring/Grail and need java solution but I will be glad to any advice.

UPD: I want to find prepared solution, maybe framework or something like that.

Ok, what is the question? What have you tried? What is stopping you from doing this? — Peter Lawrey, Apr 26 '13 at 10:09
So, I want to find prepared solution, maybe framework or something like that. But only one thing that I found it's UnityJDBC, but it works only with relational database and hasn't meta information and expiration time for queries — Andrej Soroj, Apr 26 '13 at 10:56

score 2 · Accepted Answer · answered Apr 26 '13 at 12:01

This is basic OO. You need to abstract what you are trying to achieve - loading data - from the mechanism you are using to achieve - a database query or a web-service call.

Such a design would usually involve an interface that defines the contract of what can be done and then multiple implementing classes that make it happen according to their implementation.

For example, you'd end up with an interface that looked something like:

public interface DataLoader
{
    public Collection<Data> loadData() throws DataLoaderException;
}

You would then have implementations like JdbcDataLoader, WebServiceDataLoader, etc. In your case you would need another type of implementation that given one or more instances of DataLoader, runs each sumulatiously aggregating the results. This implementation would look something like:

public class AggregatingDataLoader implements DataLoader
{
  private Collection<DataLoader> dataLoaders;
  private ExecutorService executorService;

  public AggregatingDataLoader(ExecutorService executorService, Collection<DataLoader> dataLoaders)
  {
    this.executorService = executorService;
    this.dataLoaders = dataLoaders;
  }

  public Collection<Data> loadData() throws DataLoaderException
  {
    Collection<DataLoaderCallable>> dataLoaderCallables = new ArrayList<DataLoaderCallable>>();

    for (DataLoader dataLoader : dataLoaders)
    {
      dataLoaderCallables.add(new DataLoaderCallable(dataLoader));  
    } 

    List<Future<Collection<Data>>> futures = executorService.invokeAll(dataLoaderCallables);

    Collection<Data> data = new ArrayList<Data>(); 
    for (Future<Collection<Data>> future : futures)
    {
       add.addAll(future.get());
    }      

    return data;
  }

  private class DataLoaderCallable implements Callable<Collection<Data>>
  {
    private DataLoader dataLoader;

    public DataLoaderCallable(DataLoader dataLoader)
    {
      this.dataLoader = dataLoader;  
    }

    public Collection<Data> call()
    {
      return dataLoader.load(); 
    }
  }
}

You'll need to add some timeout and exception handling logic to this, but you get the gist.

The other important thing is your call code should only ever use the DataLoader interface so that you can swap different implementations in and out or use mocks during testing.

Thanks for your detailed answer. I agree, it's not so hard to implement but I thought that I can found something like mapreduce frameworks(like hadoop) for my case. But it seems I will have to write it myself. — Andrej Soroj, Apr 26 '13 at 12:22
The work in doing a map-reduce is knowing how to map the data sets to be reduced and the reduce itself - both are going to be specific to your problem. Something like Hadoop or most of the distributed caching technologies will do map-reduce but you still implementing their interfaces to do the actual work. My advice, if you can do it easily with out introducing a framework (additional dependency), then keep it simple and ignore the framework. — Nick Holt, Apr 26 '13 at 12:33

Asynchronous multiple query from different datasources or databases

1 Answers1