1

As I'm not good at English, I don't understand what this quote from RFC 3875 means.

For any particular request, the server will identify all or a leading part of this path with an individual script, thus placing the script at a particular point in the path hierarchy.

To me, it is interpreted as:

  • The server will find out all or a leading part of the path, using an individual script (what does that even mean?).
  • The server will create or move the script to somewhere (isn't that what 'placing' means?).

which doesn't make sense at all.

Also, I don't understand what 'path hierarchy' means here.

It will be grateful if someone can explain these with examples.

You may think stack overflow is inappropriate place to ask this kind of question because this is about English, not coding, but I couldn't really find better place to ask because it requires knowledge about CGI to answer this question.
Also, the answer would be an elaborated explanation about CGI, so I think it will be helpful to other people who want to learn about CGI.

kjh6b6a68
  • 499
  • 2
  • 8

1 Answers1

1

The "path hierarchy" is refers to the directory structure in the path. In other words the segments separated by slashes. So in the path /main/sub/foo "main" is the main directory, "sub" is a sub-directory, and "foo" is a file name. These ordered path segments are what is called a hierarchy. The segments at the beginning (the left) are the most important (highest in the hierarchy.) The web server will check the path segments one at a time from left to right to try to find the CGI script that they specify.

When identifying which CGI script to use, if the request is for /some/directory/script.cgi/additional/path/info and the file /some/directory/script.cgi exists, then the webserver would use that CGI file for the request. That example shows that the file doesn't have to match the full path, just the "leading part" or "prefix" of the path.

The additional path segments after the name of the script file are effectively ignored, at least for the process of identifying which file should be used to power the request. The entire path including /additional/path/info is available to the CGI script and it can use them to determine what content to show.

The CGI subsystem of the web server never moves scripts around. I think you would find it clearer if that were phrased as "The web server may find the CGI script placed at any point in the path hierarchy, including the beginning or the middle, rather than always at the end."

Stephen Ostermiller
  • 23,933
  • 14
  • 88
  • 109
  • what if /some/directory have two child, `script.cgi` and `anotherdir`, and the request is for /some/directory/anotherdir/additional/path/info? can server execute script.cgi, according to its configuration or something? – kjh6b6a68 Aug 08 '22 at 13:13
  • If `script.cgi` isn't in the path, it wouldn't be chosen to handle the request. (With the exception of an index document like `index.cgi` which will handle a directory request. But I don't think that will ever apply in the middle of a path, only at the end.) – Stephen Ostermiller Aug 08 '22 at 13:14
  • How can server know that /some/directory/script.cgi is the script to execute, not /some or /some/directory ? – kjh6b6a68 Aug 08 '22 at 13:20
  • If `/some` and `/some/directory` are both directories and not scripts, it would open those directories and look for the next path segment name in those directories. Once it finds a CGI script, it stops processing and executes that script. If it finds a directory it continues down the path, or looks for an index document if it is at the end of the path. If the next path segment doesn't exist it will throw a 404 not found. – Stephen Ostermiller Aug 08 '22 at 13:22
  • It will use system calls to determine if something is a file or a directory. – Stephen Ostermiller Aug 08 '22 at 13:23
  • Thanks. But how did you know this? Is it specified somewhere in rfc 3875? – kjh6b6a68 Aug 08 '22 at 13:26
  • I've used Apache's implementation of this RFC and observed how it works. – Stephen Ostermiller Aug 08 '22 at 13:27
  • Wait, but in section 3.3, first sentence is saying that mapping uri with script is defined by server implementation or configuration. Based on your explanation, isn't there nothing to configure? – kjh6b6a68 Aug 08 '22 at 13:32
  • Oh maybe mapping between path from uri and the real path from the host? I think that makes sense. – kjh6b6a68 Aug 08 '22 at 13:36
  • It is very common to configure this with rewrite rules. Rewrite rules subvert this whole system. Websites usually desire pretty URLs and not paths tied to directories and script names. You can have any URL powered by any script by rewriting the request for that URL to some specific script. – Stephen Ostermiller Aug 08 '22 at 13:53