Search Engine Friendly's URL Design Copyright Statement: You can reprint it at will. When reprinting, please be sure to indicate the original source of the article, the author information and this statement in the form of a hyperlink.
/tech/google_url.html Keywords: "url rewrite" mod_rewrite isapirewrite path_info "search engine friendly"
Content summary:
In addition, as the content on the Internet grows at an astonishing speed, the importance of search engines is increasingly highlighted. If a website wants to be better included by search engines, in addition to user friendly, search engine friendly design is also very important. The more content you enter a search engine, the greater the chance that users will find with different keywords. In Google's Algorithm Survey article, it is mentioned that the number of pages indexed by Google in a site actually has a certain impact on PageRank. Since Google highlights the relatively static part of the entire network (the number of dynamic web pages is relatively small), and static web pages with relatively fixed link addresses are more suitable for being indexed by Google (no wonder many large websites archived mailing lists and monthly archived documents are easily searched), many articles on search engine URL design optimization (URI Pretty) mentioned that many use certain mechanisms to turn dynamic web page parameters into static web pages:
For example, you can:
/?mode=man¶meter=ls
become:
//man/ls
There are two main ways to implement it:
Based on url rewrite Based on path_info Pass the URI address as a parameter: URL REWRITE
The easiest thing is the URL conversion based on the URL rewrite module in various WEB servers:
This way, the implementation of the program can be almost unmodified to map links like ?id=234 into news/, which looks the same as static links from the outside. There is a module on the Apache server (non-default): mod_rewrite: URL REWRITE is powerful enough to write a previous book.
When I need to map ?id=234 to news/, just set:
RewriteRule /news/(\d+)\.html /news\.asp\?id=$1 [N,I]
This maps the request like /news/ to /?id=234
When there is a request for /news/: the web server forwards the actual request to /?id=234
There are also corresponding REWRITE modules in IIS: For example, ISAPI REWRITE and IIS REWRITE, the syntax is based on regular expressions, so the configuration is almost the same as apache's mod_rewrite: For a simple application, it can be:
RewriteRule /news/(\d+)\.html /news/news\.php\?id=$1 [N,I]
This maps /news/ to /news/?id=234
A more general expression that can parameterize all dynamic pages is:
Put /?a=A&b=B&c=C
Expressed as //a/A/b/B/c/C.
RewriteRule (.*?\.php)(\?[^/]*)?/([^/]*)/([^/]*)(.+?)? $1(?2$2&:\?)$3=$4?5$5: [N,I]
Another advantage of URL REWRITE is to hide the background implementation, which is very useful when migrating the background application platform: when migrating from Asp to Java platform, for front-end users, they cannot feel the changes in the background application. For example, when we need to migrate the application from ?id=234 to ?query=234, the performance of the front desk can always be kept as news/. From the implementation of separation of application and foreground performance: maintaining the stability of the URL, and using mod_rewrite can even forward requests to other background servers.
URL beautification based on PATH_INFO Another way to beautify Url is based on PATH_INFO:
PATH_INFO is a CGI 1.1 standard. It is often found that many "/value_1/value_2" following CGI are the PATH_INFO parameters:
For example //man/ls, in: $PATH_INFO = "/man/ls"
PATH_INFO is a CGI standard, so PHP Servlets and other support are all supported. For example, there is a () method in Servlet.
Note: /myapp/servlet/Hello/foo's getPathInfo() returns /foo, and /myapp/dir/foo's getPathInfo() will return /. From here you can also know that jsp is actually the PATH_INFO parameter of a Servlet. ASP does not support PATH_INFO,
Examples of parameter analysis based on PATH_INFO in PHP are as follows:
//Note: The parameters are divided by "/", the first parameter is empty: parse $param1 $param2 from /param1/param2.
if ( isset($_SERVER["PATH_INFO"]) ) {
list($nothing, $param1, $param2) = explode('/', $_SERVER["PATH_INFO"]);
}
How to cover up the application: For example, the extension of .php:
In APACHE, configure it like this:
<FilesMatch "^app_name$">
ForceType application/x-httpd-php
</FilesMatch>
How to be more like a static page: app_name/my/
When parsing the PATH_INFO parameter, truncate the last 5 characters ".html" of the last parameter.
Note: PATH_INFO is not allowed by default in APACHE2, and AcceptPathInfo on needs to be set
Especially for users who use virtual hosts, PATH_INFO is often the only option when they do not have permission to install and configure mod_rewrite.
OK, in the future, you will know that it may be a dynamic web page generated by the php program article/?id=234. Many sites may have many static directories on the surface, but in fact, it is very likely that they are all published using 1 or 2 programs. For example, many WIKIWIKI systems use this mechanism: the entire system is just a simple wiki program, and the directories seem to be searching results of this application using the subsequent address as parameters.
Using the solution based on MOD_REWRITE/PATH_INFO + CACHE server to transform the original dynamic publishing system, it can also greatly reduce the cost of upgrading the old system to the new content management system. It also facilitates search engines to include indexes. Appendix: How to use PHP to support PATH_INFOPHP on IIS: Just try php-4.2.3-Win32
Unpacking directory
========
php-4.2. c:\php
Initialize the file
=================
Copy: c:\php\-dist to c:\winnt\
Configuration file association
============
Follow the instructions in the configuration file association
Run the library file
==========
Copy c:\php\ to c:\winnt\system32\
After running this way: you will find that php maps PATH_INFO to the physical path
Warning: Unknown(C:\CheDong\Downloads\ariadne\www\\path): failed to create stream: No such file or directory in Unknown on line 0
Warning: Unknown(): Failed opening 'C:\CheDong\Downloads\ariadne\www\\path' for inclusion (include_path='.;c:\php4\pear') in Unknown on line 0
Install ariadne's PATCH
==================
Stop IIS Service
net stop iisadmin
ftp:///pub/ariadne/win/iis/php-4.2.3/
Overwrite the original c:\php\sapi\
Note:
ariadne is a content publishing system based on PATH_INFO.
PHP 4.3.2 PATH_INFO in CGI mode in RC2 has been corrected and installed as usual.
References:
URL Rewrite documentation:
/docs/
/docs/mod/mod_rewrite.html
/docs-2.0/mod/mod_rewrite.html
Search engine friendly URL design
/article/485
Maybe this URL turns out to be?id=485
An open source content management system based on PATH_INFO
/
What does Google have no index?
/newsGoogle/2003/05/
Google's PageRank description:
/