CGI Environmental Variables
One of the methods that the web server uses to pass information to a cgi script
is through environmental variables. These are created and assigned appropriate values
within the environment that the server spawns for the cgi script. They can be accessed
as any other environmental variable, like with getenv() (in C) or
%ENV{'VARIABLE_NAME'} (in Perl). Many of them, contain important information, that
most cgi programs need to take into account.
This list, highlights some of the most
commonly used ones, along with a brief description and notes on possible uses for them.
This list is by no means a complete reference; many servers pass their own extra
variables, or having different names for some, so better check with your server's
documentation. The purpose of this list is only to suggest some common good uses
for some of the server-passed information.
CONTENT_LENGTH
The length, in bytes, of the input stream that is being passed through standard input.
This is needed when a script is processing input with the POST method, in order to read
the correct number of bytes from the standard input. Some servers end the input string
with EOF, but this is not guaranteed behaviour, so, in order to be sure that you read the
correct input length you can do something like
read(STDIN,$input,$ENV{CONTENT_LENGTH})
DOCUMENT_ROOT
The directory over which all www document paths are resolved by the server.
Sometimes it is useful to know the server's document root, in order to compose
absoulte file paths when all the script is eing given as a parameter is the
relative path of the file within the www directory. It is also good practice
to have your script resolve paths in this way, both for security reasons and
for portability. Another common use is to be able to figure out what the url
of a file will be if you only know the absolute path and the hostname. (there's
another variable to find out the hostame)
HTTP_REFERER
The URL that the referred (via a link or redirection) the web client to
the script. Typed URLs and bookmarks usually result in this variable being
left blank.
In many cases a script may need to behave differently depending on the
referer. For example, you may want to restrict your counter script to
operate only if it is called from one of your own pages, to prevent
someone from using it from another web page without your permission.
Or even, the referer may be the actual data that the script needs
to process. Extending the example above you might also like to install
your counter to many pages, and have the script figure out from the referer
which page generated the call and increment the appropriate count, keeping
a separate count for each individual URL. A snippet for the referer blocking
example could be:
die unless($ENV{HTTP_REFERER}=~m/http:\/\/(www\.)?$mydomain\//);
HTTP_USER_AGENT
The name/version of the client issuing the request to the script.
Like with referers, one might need to implement behaviours that vary
with the client software used to call the script. A redirection script
could make use of this information to point the client to a page optimized
for a specific browser, or you may want to have it block requests from
specific clients, like robots or clients that are known not to support
appropriate features used by what the script would normally output.
PATH_INFO
The extra path information followin the script's path in the URL.
A URL that refers to a script may contain additional information,
commonly called 'extra path information'. This is appended to
the url and marked by a leading slash. The server puts this information
in the PATH_INFO variable, which can be used as a method
to pass arguments to the script.
PATH_TRANSLATED
The PATH_INFO mapped onto DOCUMENT_ROOT.
Usually PATH_INFO is used to pass a path argument to the script. For example
a counter might be passed the path to the file where counts should be stored.
The server also makes a mapping of the PATH_INFO variable onto the document
root path and store is in PATH_TRANSLATED which can be used directly as an
absolute path/file.
QUERY_STRING
Contains query information passed via the calling URL, following a question
mark after the script location.
QUERY_STRING is the equivalent of content passed through STDIN in POST, but
for script called with the GET method. Query arguments are written in this
variable in their URL-Encoded form, just like they appear on the calling
URL. You can process this string to extract useful parameters for the script.
REMOTE_ADDR
The IP address from which the client is issuing the request.
This can be useful either for logging accesses to the script (for example
a voting script might want to log voters in a file by their IP in order
to prevent them from voting more than once) or to block/behave differently
for particular IP adresses. (this might be a requirement in a script that
has to be restricted to your local network, and maybe perform different
tasks for each known host)
REMOTE_HOST
The name of the host from which the client issues the request.
Just like REMOTE_ADDR above, only that this is the hostname of the remote
machine. (If it is known via reverse lookup)
REQUEST_METHOD
The method used for the request. (usually GET, POST or HEAD)
It is wise to have your script check this variable before doing anything. You
can determine where the input will be (STDIN for POST, QUERY_STRING for GET)
or choose to permit operation only under one of the two methods. Also, it is
a good idea to exit with an explanatory error message if the script is called
from the command-line accidentally, in which case the variable is not defined.
SCRIPT_NAME
The virtual path from which the script is executed.
This is very useful if your script will output html code that contains
calls to itself. Having the script determin its virtual path, (and hence,
along with DOCUMENT_ROOT, its full URL) is much more portable than hard
coding it in a configuration variable. Also, if you like to keep a log
of all script accesses in some file, and want to have each script report
its name along with the calling parameters or time, it is very portable to
use SCRIPT_NAME to print the path of the script.
SERVER_NAME
The web server's hostname or IP address.
Very similarly to SCRIPT_NAME this value can be used to create more portable
scripts in case they need to assemble URLs on the local machine. In scripts
that are made publically accessible on a system with many virtual hosts, this
can provide the ability to have different behaviours depending on the virtual
server that's calling the script.
SERVER_PORT
The web server's listening port.
Complements SERVER_PORT above, in forming URLs to the local system.
A commonly overlooked aspect, but it will make your script portable
if you keep in mind that not all servers run on the default port and
thus need explicit port reference in the server address part of the URL.
|