PATH_INFO value
(1.1) By sodface on 2022-04-16 19:43:32 edited from 1.0 [source]
I've used PATH_INFO in several similar but different use cases and it's great to have it available, however, there are a couple cases where I think the value is sort of unexpected and conflicts with the source code comments. Basically when the value of PATH_INFO is "/", for example:
http://www.example.com/section/ PATH_INFO="/"
http://www.example.com/section/index.cgi PATH_INFO=""
http://www.example.com/setcion/index.cgi/ PATH_INFO="/"
From the source code comments:
/* Locate the file in the filesystem. We might have to append a name like "/home" or "/index.html" or "/index.cgi" in order to find it. Any excess path information is put into the zPathInfo variable.
and:
/* Part of the pathname past the file */
In the first example, althttpd is having to append the file but there is no excess path information so I would expect PATH_INFO to be "" not "/".
The second example is expected.
The third example is also sort of expected though not especially useful?
I guess I'm wondering if PATH_INFO should ever be "/"? It's easy to account for in the cgi script but something to be aware of.
(2) By Stephan Beal (stephan) on 2022-04-17 11:50:51 in reply to 1.1 [link] [source]
I guess I'm wondering if PATH_INFO should ever be "/"?
FWIW, i've been searching for a definitive answer to this but have yet to find one - all sources seem to be equally vague, or even self-contradictory (e.g. wikipedia), on it. The real answer is probably "whatever Apache does," but i no longer have an apache installation to quickly try that out. i'm currently tied up with dogsitting but will experiment with this later on.
(3) By Stephan Beal (stephan) on 2022-04-17 12:25:19 in reply to 1.1 [link] [source]
I guess I'm wondering if PATH_INFO should ever be "/"?
Found it:
"" and "/" are both valid...
https://datatracker.ietf.org/doc/html/draft-coar-cgi-v11#section-4.1.5
Says, in part:
The PATH_INFO variable specifies a path to be interpreted by the CGI
script. It identifies the resource or sub-resource to be returned by
the CGI script, and is derived from the the portion of the URI path
hierarchy following the part that identifies the script itself.
Unlike a URI path, the PATH_INFO is not URL-encoded, and cannot
contain path-segment parameters. A PATH_INFO of "/" represents a
single void path segment.
PATH_INFO = "" | ( "/" path )
path = lsegment *( "/" lsegment )
lsegment = *lchar
lchar = <any TEXT or CTL except "/">
The answer to your question seems to lie in "derived from the the portion of the URI path hierarchy following the part that identifies the script itself." If it has a slash after the script name, PATH_INFO should contain that slash.
(4) By sodface on 2022-04-17 13:29:08 in reply to 3 [link] [source]
Thanks Stephan, and I apologize because I did zero research on this myself prior to posting. After reading your first reply, I found the same RFC section and came to the same conclusion.
Though I'm still inclined to think that in the examples in my original post, per the RFC, PATH_INFO should be:
""
""
"/"
Though for my use cases so far, it doesn't really matter because I was looking for them to be consistent, either all three "" or all three "/", with "" making the most sense to me in my examples.
(5.1) By sodface on 2022-04-17 13:49:46 edited from 5.0 in reply to 4 [link] [source]
Here's one example page that shows Apache values (via php) with PATH_INFO set to "/"
https://www.zytrax.com/run/ssi.php/
If you take the trailing slash off, the PATH_INFO entry goes away altogether, which I assume would mean it's not set (or set but null?).
(7.1) By sodface on 2022-04-21 11:38:00 edited from 7.0 in reply to 4 [link] [source]
Perhaps a better test page: https://whimsy.apache.org/test.cgi
There are some differences in the value of PATH_INFO between this test page, which I assume accurately illustrates Apache's handling of it, and althttpd.
The biggest difference seems to be that Apache is normalizing (or translating?) the excess path info.
Multiple slashes become one for example:
https://whimsy.apache.org/test.cgi/foo///bar/////baz
REQUEST_URI /test.cgi/foo///bar/////baz
PATH_INFO /foo/bar/baz
Even ".." is translated:
https://whimsy.apache.org/test.cgi/foo/bar/../baz
REQUEST_URI /test.cgi/foo/baz
PATH_INFO /foo/baz
But that seems to occur earlier based on the value of REQUEST_URI.
And finally, PATH_INFO seems to be unset when there is no excess path info:
https://whimsy.apache.org/test.cgi
(PATH_INFO no longer listed in output)
Anyway, I'm not complaining here, just noting that scripts using PATH_INFO may need to be revised to work correctly with althttpd.
(8) By Stephan Beal (stephan) on 2022-04-21 12:43:30 in reply to 7.1 [link] [source]
Perhaps a better test page: https://whimsy.apache.org/test.cgi
That's a nice feature. Perhaps we should post a similar page for althttpd.
Multiple slashes become one for example:
That's arguably a bug, but also arguably a convenience for the user. It probably stems from the design guidelines common in the 90s/00s which said something like "be lax in what data you accept and strict in what you emit." The "being lax" guideline turned out to be a huge long-term pain in the butt at all levels of the software development chain (and was, i opine, PHP's primary failing - it's plagued by historical baggage from its early wild and carefree days).
Even ".." is translated:
Apache apparently normalizes those inputs before passing them on. Althttpd doesn't do that. Where Apache's a general-purpose web server, and therefore has to cater to wider needs, althttpd is a very special-purpose server, specifically written for Richard's needs, and his needs have apparently never included access to resources via paths with ".." in their names. (The first time he needs it, though, you can be sure it will be added ;).)
And finally, PATH_INFO seems to be unset when there is no excess path info:
It might actually be set, but not be emitted if it's empty. We can't know that without seeing the test.cgi
source code.
(9) By sodface on 2022-04-22 01:50:32 in reply to 8 [link] [source]
We can't know that without seeing the test.cgi source code.
(10) By Stephan Beal (stephan) on 2022-04-22 10:16:20 in reply to 9 [link] [source]
Maybe here?
Indeed - it looks (at lines 5 and 13) like PATH_INFO is simply not set when it's empty in that environment.