0

Im looking for a solution for what appears to be a complex problem. Basically, I need to find the fastest and most reliable way to create one "Master" database (system of records) based on data from multiple other "Distributed" systems of records. Let me break down the problem: I have N system of records that each have a web service attached and respond to requests for data. Every one of the N web services has standardized the response object schema, so the XML/JSON response data schema is exactly the same for each web service. The mission is to create one single "Master" database that is a reflection of all the N system of records in the fastest way possible. (Basically, a merge of all N databases periodically through some procedure/application/program/etc.).

The only way to obtain the data from each of the "distributed" system of records is by interacting with their respective web services. This is the worldwide standard agreed upon by all of the N system of record owners.

What is the most efficient way to tackle this problem?

Here is one approach that was considered, but I think this approach is not optimal: Creating a C# program that will concurrently/asynchrounously request data from each of the web services, and push that data into a staging environment in SQL Server. SQL Server would then run procedures to merge that data into a master database (Currently this would take 17 hours to complete, obviously this is a long time)

Here is a basic drawing of the moving parts, the "??" in the middle represents the solution Here is a basic drawing of the moving parts, the "??" in the middle represents the solution

Community
  • 1
  • 1
Judy007
  • 5,484
  • 4
  • 46
  • 68
  • Your approach seems fine, but your question is lacking some details. You wrote that this approach takes about 17 hours to complete. How many records are transferred in that time? where are the bottle necks? how are you sending the records to your main sql server? – Zohar Peled Jul 21 '16 at 04:42
  • Hi Zohar, The approach I am using works, but doesnt seem to me to be optimal. Yes, in the 17 hours, about 159273 records are saved to DB. The biggest bottleneck right now with the current solution is that requests to web services take a few seconds per request, therefore, the total amount of time to process is around 17 hours. I can use TPL Task Parallel Library in c# to utilize mutliple threads for the requests, but I still feel like the solution is more complicated than it needs to be. What do you think? Again, the goal is to minimize the "total time to sync" – Judy007 Jul 21 '16 at 18:01
  • My best solution to this, without exploring Queues and "Competing Consumers" pattern is to use Task Parallel Library, is to aggregate all the URI's that will need to be requested from the XML web services, create tasks (perhaps batches of 100 at a time) and run the GET requests in parallel. That would cut the time down from 17 hours (series of requests) to much less , perhaps less than 2 hours (running requests in parallel) – Judy007 Jul 22 '16 at 16:51

1 Answers1

0

Thank you Zohar. It turns out that I will still attack this problem with Task Parallel Library. I will continue to use tasks, but have decided to create a batch of tasks (5-10) that all execute async, and then work with a ConcurrentQueue for all the web URLS. This will allow me to minimize the time.

There is definitely a good reason to focus on the "bottlenecks", and each web server has different "bottle neck rates" (for lack of better words). I will configure my algorithm create N number of tasks, with N being associated and configurable to the web service. So, for instance, if web service 1 is highly performant, I will create more tasks(threads) to run the requests in parallel. Thank you.

Judy007
  • 5,484
  • 4
  • 46
  • 68