SoFunction
Updated on 2025-03-10

Use a customized PHP application to obtain the status information of the web server

Most Web hosting companies support customer access to Web site statistics, but you often feel that the status information generated by the server is not comprehensive enough. For example, if a web server that is configured incorrectly cannot recognize certain file types, these types of files will not appear in the status information. Fortunately, you can use PHP to customize the status information collection program so that you can get the information you need.


Structure of Common Logfile Format (CLF)

 
CLF was originally designed by NCSA for HTTPd (global network server software). CERN HTTPd is a public domain web server maintained by the World Wide Web Consortium (W3C). The W3C website lists the log file specification. Both Microsoft and UNIX-based web servers can generate log files in CLF format. The CLF format is as follows:
Host IdentAuthuserTime_Stamp "request" Status_codeFile_size

For example:
21.53.48.83 - - [22/Apr/2002:22:19:12 -0500] "GET / HTTP/1.0" 200 8237

Below is a detailed classification of log entries:

Host is the IP address or DNS name of the website visitor; in the above example, it is 21.53.48.83.
Ident is the remote identity of the visitor (RFC 931). The dash indicates "unspecified".
Authuser is the user ID (if the web server has verified the identity of the website visitors).
Time_Stam is the time the server returns in the format "Day/month/year".
A request is an HTTP request from a website visitors, such as GET or POST.
Status_Code is the status code returned by the server, for example: 200 means "correct - the browser request is successful".
File_Size is the size of the file requested by the user. In this case, it is 8237 bytes.


Server status code

 
You can find the server status code specification developed by W3C in the HTTP standard. These status codes generated by the server indicate whether the data transmission between the browser and the server is successful or not. These codes are usually passed to the browser (for example, the very famous 404 error "page not found") or added to the server log.


Collect data

The first step in creating our custom application is to get user data. Whenever a user selects a resource on the website, we want to create a corresponding log entry. Fortunately, the existence of server variables allows us to query the user's browser and get data.

The server variable in the header carries information passed from the browser to the server. REMOTE_ADDR is an example of a server variable. This variable returns the user's IP address:
Example output: 27.234.125.222

The following PHP code will display the IP address of the current user:
<?php echo $_SERVER['REMOTE_ADDR']; ?>

Let's take a look at the code for our PHP application. First, we need to define the website resource we want to track and specify the file size:
//Get the file name we want to record
$fileName="";
$fileSize="92292";

You don't need to save these values ​​into static variables. If you want to keep track of many entries, you can save them to an array or database. In this case, you might want to find each entry through an external link, like so:
<a href="?bannerid=123"><imgsrc="" border="0"></a>

where "123" means the record corresponding to "". Then, we query the user's browser through the server variable. This way we get the data we need to add a new entry to our log file:
//Get CLF information from website viewers
$host=$_SERVER['REMOTE_ADDR'];
$ident=$_SERVER['REMOTE_IDENT'];
$auth=$_SERVER['REMOTE_USER'];
$timeStamp=date("d/M/Y:H:i:s O");
$reqType=$_SERVER['REQUEST_METHOD'];
$servProtocol=$_SERVER['SERVER_PROTOCOL'];
$statusCode="200";

We then check if the server returns a null value (null). According to the CLF specification, null values ​​should be replaced by dashes. In this way, the task of the next code block is to find the null value and replace it with a dash:
//Add a dash to empty values ​​(according to specifications)
if ($host==""){ $host="-"; }
if ($ident==""){ $ident="-"; }
if ($auth==""){ $auth="-"; }
if ($reqType==""){ $reqType="-"; }
if ($servProtocol==""){ $servProtocol="-"; }

Once we obtain the necessary information, these values ​​will be organized into a format that complies with the CLF specification:
//Create a CLF format string
$clfString=$host." ".$ident." ".$auth." [".$timeStamp."] \"".$reqType." /".$fileName." ".$servProtocol."\" ".$statusCode." ".$fileSize."\r\n";

Create a custom log file

Now, the formatted data can be stored in our custom log files. First, we will create a file naming contract and write a method (function) that generates a new log file every day. In the examples given in this article, each file starts with "weblog-" and then a date represented by month/day/year with the file extension .log. The .log extension generally represents the server log file. (In fact, most log analyzers search for .log files.)
// Name the log file with the current date
$logPath="./log/";
$logFile=$logPath."weblog-".date("mdy").".log";

Now, we need to determine whether the current log file exists. If it exists, we add an entry to it; otherwise, the application creates a new log file. (The creation of new log files usually occurs when the date changes, because the file name changes at this time.)
//Check whether the log file already exists
if (file_exists($logFile)){
//If it exists, open the existing log file
$fileWrite = fopen($logFile,"a");}
else {
// Otherwise, create a new log file
$fileWrite = fopen($logFile,"w"); }

If you receive a "Permission Denied" error message when writing or appending a file, please change the permissions of the target log folder to allow the write operation. The default permissions for most web servers are "readable and executable". You can use the CHMOD command or use the FTP client to change the permissions of the folder.

Then, we create a file locking mechanism so that when two or more users access the log file at the same time, only one of them can write to the file:
//Create a locking mechanism for file write operations
flock($fileWrite, LOCK_SH);

Finally, we write the content of the entry:
//Write CLF entries
fwrite($fileWrite,$clfString);
//Unlock file
flock($fileWrite, LOCK_UN);
//Close the log file
fclose($fileWrite);

Process log data

 
After the system is productized, the customer wants to obtain a detailed statistical analysis of the collected visitor data. Since all custom log files are organized in a standard format, any log analyzer can handle them. Log Analyzer is a tool that analyzes large log files and produces pie charts, histograms, and other statistical graphs. Log analyzers are also used to collect data and integrate information about which users are visiting your website, clicks, etc.

Here are a few popular log analyzers:

WebTrends is a very good log analyzer, it is suitable for large-scale websites as well as enterprise-level networks.
Analog is a popular free log analyzer.
Webalizer is a free analysis program. It can generate HTML reports so that most web browsers can view its reports.

Comply with standards

We can easily extend the application to enable it to support other types of logging. This way you can capture more data, such as browser type and referrer (referrer refers to the previous page linked to the current page). The experience here is: following standards or conventions when you are programming will eventually simplify your work.