-2

I am working on a hobby project website. Each user has up to several hundred megabytes of data specific to them stored in my database. The user can run various types of statistical analysis on the data that will result in graphs for the user to see the results. The user will do all this from the browser.

My question is how do I set up the server side? There needs to be support for at least a few thousand concurrent users. Each user is expected to make a few queries on their data set in a session. Clearly I cant just have a single web server.

What I am thinking so far is having a web server take in requests then a script on the web server sends a request to a cluster of several machines that do the number crunching. The cluster contains a master and several workers. All requests come to the master. The master monitors the workers and sends the request to the best available worker. The worker crunches out the numbers and sends a response back to the web server. The web server then sends the data to the user where the graphs are built.

Does that idea work? If so, how would I create a connection to the master? What would its contact info be? Is there good load balancing software out there so that I wouldn't have to develop the master?

Also, how do companies do things similar to this, or rather what is the best way to solve this problem? I tried to look it up, but could not find any specifics. Thanks in advance.

xirtam
  • 3
  • 2

1 Answers1

0

This is traditionally done with a pub/sub model. The exact implementation depends on what language/platform you are using, but the basic implementation is thus:

  1. The client creates one of these "query objects" which is written to a database, and drops a message on a message queue.
  2. The client starts polling the database for the results (when the application is web based, otherwise a response queue is often set up).
  3. The workers poll the queue for work when they go idle. When they find a message, they pick it up, run the "job/query/whatever", and write the results back to the database for the client to pick up.

There are varieties on this theme, like when the request/response is small enough to fit in the queue message itself, you can eliminate the intermediate database, but this gets ugly in a web-based polling model, as you need to get the right response back to the right http response thread.

Rob Conklin
  • 8,806
  • 1
  • 19
  • 23
  • Thanks. What role does the web server typically play? For example, let's say I have this pub/sub system as a separate application and I want to integrate it into the web site. The backend is just a single web server with a database. Does the browser client contact the web server, then the web server replies with contact info for the pub/sub system so the client can directly communicate with the system? Or does all communication from client go through the web server, and the web server acts as an intermediary between the client and pub/sub system? Ideally, I don't want the application exposed – xirtam Oct 27 '15 at 18:56
  • Yes, the web server acts as the intermediary. When the client makes the request, the web server writes the query request to the database, and drops the message on the queue. The web server also checks the database when the status of the query is requested, and sends the response when the query is complete. – Rob Conklin Oct 27 '15 at 19:43