1

I have Apache running on a quad-core Ubuntu server on a 384kbps ADSL. Files are uploaded by users via a web-form and processed by various Python programs running as CGI scripts. Some scripts are CPU intensive and run at 100% (on one core) for a few minutes; these email the results to the user so the HTTP session is not held open. Some scripts require larger files (total of a few MB) to be uploaded. At present, usage is very low, with a handful of hits per day and very few, if any, instances or more than user making use of the services at the same time. I will need to make these services available to a greater number of users in the medium-term.

I suspect that the infrastructure I have built does not lend itself easily to scaling. For example, one user has requested that I allow multiple files to be uploaded to the CPU-intensive program. This means that the machine will be busy for a longer period of time. If another user also uploaded multiple files to the same script, the machine may become very busy for an even longer period.

I know discussion-type questions are not permitted here, so I'd like to ask the following specific questions:

What strategies or approaches will I need to consider when making these services scalable -- that is, do I need to rethink the infrastructure completely?

If I made no changes and 10 people each uploaded 10 files to the CPU-intensive program, for example, would all 10 threads created by the CGI script just run happily (if slowly) over all 10 input files? Is it "safe" to have a server running at 100% CPU usage for an hour or two or three?

SabreWolfy
  • 368
  • 1
  • 2
  • 13

2 Answers2

1

If your python has been well written, and is decently modularised, then it shouldn't be too bad.

What you need to do is look into Celery, and use it as a job queue.

When a user submits a file for processing, it gets queued by Celery, and then will be processed either on the same server, or by a worker node, when the resources are available. Celery is typically backed by RabbitMQ or Redis as the message broker (actual queue server), and those are relatively easy to scale.

As far as the "job complete" callback is concerned, there's loads of different options available, you can still use email, or you could look at a service like Pusher to send notifications back to the submitting user's browser.

Servers are designed to run at 80-90% CPU load, actually. I mean, that's where you're getting the most utilisation out for the power you put in (kinda).

I suspect however you're hosting this from home (hence the slow ADSL uplink), and that it might actually just be a reused desktop PC, which aren't suitable for server-type duty cycles and loading.

Tom O'Connor
  • 27,480
  • 10
  • 73
  • 148
1

As a start you should considering into using the WSGI interface for your application, then consider implementing some async event driven library like Celery or gevent to schedule your application logix into tasks.

CGI is the oldest and most inefficient way of invoking external code, both from memory and flexibility perspectives, reconsider your project to use any of the python micro frameworks (ex. bottle.py or flask), this will give you a much more stateful environment to which you can connect the logic (your python code) to work with the before mentioned libraries.

Martino Dino
  • 1,145
  • 1
  • 10
  • 17