4

I have a rather resource intensive CGI that takes quite a long time to start sending data. We've seen quite a few cases where impatient people reload a couple of times, which then triggers additional runs of the CGI to be loaded, or cases where the client times out and drops the connection, but the CGI keeps running.

Is there any good way to detect when this has happened? It doesn't even need to be within the CGI itself (and likely better if it isn't -- it hands off to another program that I don't have control of), but could be a cron job that runs every so often to look for dead connections to reap.

I'm currently using Apache, but this is such a problem that I'd be willing to run some other webserver if it has provisions to deal with it (or a way to let me monitor for the problem).

Joe H.
  • 1,917
  • 12
  • 13

3 Answers3

5

Usually, you can't detect broken connection until you start writing back to user. Otherwise, your process will continue doing its work without noticing connection abort from user side. This post is related even if it talks about PHP. The concept should be the same.

There are possible things you can try:

  1. Do the time-consuming work in background. When a user requests the CGI, don't execute the task as a normal blocking call. Just return anything to the user to tell that request is under processing. Of course, you need to find some way to update the view or provide another page to check for job status using some request ID or IP.
  2. Send data back to client as soon as possible and exit if failed to send (indication of broken connection). You can for example send progress of job every few seconds or minutes.

If you save the currently running jobs in a database, you can save request ID and/or client IP address. So, you can detect and ignore duplicate requests to same resource telling the user "you request is under processing".

Khaled
  • 36,533
  • 8
  • 72
  • 99
  • I wish I had that much control over the process that I'm handing off to. (my CGI is a wrapper to a program that's intended to be run on the command line) The problem is that the underlying program sometimes needs to load data from another server (which could be bogged down ... they're currently taking 10+ minutes for some to transfer)... and sometimes that other server needs to load the data from tape (which doesn't happen automatically ... I have to figure out that it happened, and then fire off a request out-of-band) – Joe H. May 23 '15 at 11:08
  • @JoeH.: Can you run it in background and monitor its progress somehow? In the wrapper CGI, you can kill it when you detect client abort. – Khaled May 23 '15 at 11:20
  • I might have to ... right now, I hand off to the program entirely rather than run the output through the wrapper (as it might generate 2GB of output). ... but if the only way to sense the disconnect is a failure on writing, and the writing might not start for 10min, there might not be any gains. – Joe H. May 23 '15 at 12:36
  • I don't know the details of the job you are running, but I think you can probably write some dummy data (like progress) every minute even before you have real data. – Khaled May 23 '15 at 14:19
  • Unfortunately, no ... it's a file downloader, and the clients are pretty stupid, not your standard web browser. Not a single one supports a 503 response, and most don't even handle a `Location:` to try to give them a URL w/ serial number to try to buy more time. The only thing that I *might* be able to slowly trickle to them is an HTTP header, but to do that would mean I could no longer send other HTTP status if something goes wrong. – Joe H. May 23 '15 at 14:40
2

Warning: this information may be obsolete. See last paragraph.

I remember having had the same problem, and solving it with a nph (no parse header) CGI script.

Normally, apache collects all the headers from your script, and when finished reading headers, amends them with some standard headers you didn't provide. Which also means, as long as you don't finish the headers, apache sends nothing to the client at all.

With an nph script, you'll have to provide all the headers, but apache will send them to the client immediately, and send your CGI script a SIGPIPE once the client disconnects. So you can send some X-Slowly-Counting-Part-nnn: yes header every few seconds, to prevent timeouts on the client, and you'll get notified if the clients breaks the connection.

This still leaves the problem you'll have to send the HTTP Status first, but if you send a 'Content-Length: 0', or maybe a 'Content-Length: 1' and close the connection without sending any content, your file downloader should assume a network error and act accordingly.

You will probably have to pipe the output of the other program through your process, but this shouldn't be a major performance hit at least if you're on linux and use the sendfile(2) system call.

The problem with all of this is i used it at least 10 years ago, probably on Apache 1.3, and googling for apache cgi nph didn't yield anything useful. So maybe the nph feature was taken out in the meanwhile - but then, maybe not, i admit i didn't look very hard.

Guntram Blohm
  • 569
  • 2
  • 7
  • Thanks. NPH still exists. (it's used for 'server push' type applications) (now to go and see if I can get it working w/ my problematic program) – Joe H. May 23 '15 at 23:08
2

Old question, but i had the same problem and solve checking if connection is stablished:

In my case, i'm running one bash scripting on server.. the env variables are exported by mod_cgi I believe this solution will works for any program/script running on CGI

ss -nt state established "( sport = :$SERVER_PORT and dport = $REMOTE_ADDR:$REMOTE_PORT )" 2>/dev/null | grep -q "$REMOTE_ADDR:$REMOTE_PORT"
if [ "$?" -ne '0' ]; then
     # Client closed browser/connection
fi
Cycne
  • 21
  • 1