Please could anyone point me in the right direction on how to design/build a web service client that will consume terabytes of data and perform some computation on the retrieved data?
I inherited a project at my new job. The project has been designed and has been started by the group a few weeks before I joined the team. The project is about retrieving data from several web services (soap & rest) and performing some computation on the data before storing in database, displaying to user and generating reports.
The process of getting the data involves pulling some data from web service A, B, C and using the response to make another request to web service X, Y&Z. (we don’t have control over the web service producers). The current implementation is very slow most times we run out of memory when trying to do some computation on the retrieved data. The data is in terabytes or more. The current implementation uses maven/spring.
I am at the point of drawing up a new design for this project (introducing a bit of caching etc) but I would need some suggestions from anyone who has encountered this kind of problem before.
Aside from the obvious, are there any special tricks or approach to this? I know this might sound like a stupid question to some people, but any pointers would help.