0

Recently, I got into a discussion with an Architect who is known to be a seasoned Architect. The discussion was around an ideal Architecture and design for a multi-tenant Web Based Application that runs in a Web Farm. The application’s only job is to allow users to upload ‘n number’ of Excel files which are being processed by the System to generate very complex reports. Processing of these files takes a long time ( an hour for each, let’s take it as a constraint). Hence, users after upload wait for the notification from the System to download the generated reports.

At first glance the requirement looks pretty simple, but the expectation is that the application must be 100% scalable. We discussed on various solutions along with the Architectures but we didn’t find it satisfactory. I need members from this community to propose solution with design along with Technologies. This is not my professional assignment but its just a survey to find out Architect’s view on building scalable applications VS just Cloud ready applications where its easy to scale the infrastructure rather than focusing on applications scalability.

  • are all the excel processing independent of each other? If there are no dependencies it looks like quite an easy problem as long as you just keep scaling the hardware as in a cloud. – computinglife May 02 '13 at 18:46
  • yes all processing is independent, but the challenge was to have the application scalable rather than just pumping in more h/w. We should consider scaling out as the last option, until we reach to the threshhold where application can not be further scaled-up. – iarchitect May 08 '13 at 13:07
  • Well then your question should be changed to be about efficiency rather than scalability since you already have it. If you change your question to be one specific about excel document conversion efficiently, you would get better answers. – computinglife Jun 27 '13 at 06:03

1 Answers1

0

Users can upload excel files through the site. Their credentials are also passed. The backend registers this request in the database, and returns the request id (a Guid or something). The process ends.

A windows service is running and polling the database, looking for new requests to process. You could make use of Quartz.NET to schedule a number of parallel jobs that will handle requests. This request handling (processing of excel files and generating reports) is delegated to load balanced WCF services, so the more WCF services you have, the more parallel Quartz jobs can be scheduled. Another type of Quartz job can be scheduled to send mails if requests have been handled.

The site polls at regular intervals to see the status and progress of requests (or a specific request); and manage them. For requests that have completed the report can be downloaded.

I think this is a very scalable solution that is also loosely coupled.

L-Four
  • 13,345
  • 9
  • 65
  • 109
  • Dear L-Three thanks for your suggestion. Looks feasible and descent solution, how do we ensure the scalability of DB Operations? Like select, inserts, updates at a large scale? Do all these services access the DB at the same time? Will it not tax the database too much? Similarly how do we ensure the that so many requests coming from the client are handled efficiently e.g. parsing the database? – iarchitect May 08 '13 at 09:12
  • I think the database is rarely the bottleneck, but it depends on what you are trying to do of course. For example, to handle concurrent reading/writing, use the correct locking mechanism (like http://ludwigstuyck.wordpress.com/2013/02/28/concurrent-reading-and-writing-in-an-oracle-database/), for bulk insert, use the proper technique (like http://ludwigstuyck.wordpress.com/2013/02/27/bulk-insert-into-an-oracle-database/) etc. So try to come up with concrete non-functional requirements first, then do the technical design. – L-Four May 13 '13 at 06:03
  • Thanks, I leave it open for further comments. – iarchitect May 17 '13 at 11:01