Yavor Atanasov web portfolio

Weblog

2010
02.16

Apache URL mod_rewriteURL rewriting and redirecting helps web developers create websites that are more user and search engine friendly and even more secure. The URL is the pathway that leads the user (be it a human or a machine) to the valuable information your website has to offer. The longer and “dirtier” this pathway is, the more difficult it is for your information to be discovered and reached. That is not something you would like to do to your website.

There are several scenarios in which you as a developer might want to use URL rewriting and redirecting. You might need to:

  • tidy up messy URLs which are the side effect of data intensive dynamic websites
  • temporarily redirect incoming requests due to maintenance
  • permanently redirect a certain request due to a changed location of a specific resource

The Apache web server offers us an elegant and very powerful solution to URL manipulation in the face of mod_rewrite module. It uses regular expressions to match any requested URL to the corresponding resource location on your server. It can be used on a per-server basis (via the httpd.conf) or on a per-directory basis (using .htaccess files). Per-server configuration is the recommended way (provided you have access to the server settings) mostly due to performance overhead when using .htaccess files (more on this here). However, when you do not have direct access to your server .htaccess do the trick pretty well. Let’s have a look at how mod_rewrite would tackle the upper mentioned three scenarios.

Messy URLs

Dynamic website architectures by default come with a price – long URL paths containing variables and values that the server-side scripts need to get what the user requested.

Messy URL example: www.somewebsite.com/guitars.php?type=electric&brand=esp&year=2010

That is what our web application needs to know to get to the needed information. However, that is something that the user does not care about and does not need to see or know. A much more visually appealing and easily remembered way of presenting the upper URL would be:

Tidy URL example: www.somewebsite.com/guitars/electric/esp/2010

There are no special characters, no equal signs and no file extensions. It is more logical to the human eye and certainly more understandable to search engines. Now let’s see how Apache’s mod_rewrite can be used to map the tidy URL the user inputs to the “messy” URL your web application needs. All you need to do is create a .htaccess file in your base directory (or wherever the application is installed) and put the following magic code in it:

RewriteEngine On
RewriteRule ^([a-z]+)/([a-z]+)/([a-z]+)/([0-9]{4})$ /$1.php?type=$2&brand=$3&year=$4 [L]

The RewriteEngine directive controls the status of the rewriting engine – whether it’s ON or OFF. The RewriteRule uses three arguments divided by space:

  • pattern – a regular expression to match the URL to be rewritten
  • substitution – the location where the matched URL should be sent to
  • flags – a set of options regarding the mod_rewrite operation (more on flags here)

In our case the pattern part consists of – ^([a-z]+)/([a-z]+)/([a-z]+)/([0-9]{4})$
Let’s break it down:

^ – this special character matches the beginning of the coming URL string. In our case this is www.somewebsite.com/
[a-z]+ – this regular expression matches any alphabetical string. In our case it matches guitars, electric, esp.
[0-9]{4} – this matches any combination of exactly 4 digits. We need this to match the year 2010.
Those regular expression segments are encapsulated with brackets (). This way they can be back-referenced in the substitution argument using $1, $2, $3, $4
$ – this indicates the end of the string

The substitution argument in our RewriteRule directive is – /$1.php?type=$2&brand=$3&year=$4. The variables $1, $2, $3, $4 are back-references to the pattern segments encapsulated with brackets and in our case they hold the values – guitars, electric, esp, 2010.

The flag argument we are using is [L]. The L flag (“L” for last) tells the rewrite engine to stop processing further if the current rule is matched. More on the different flags you can use with mod_rewrite can be found on the Apache mod_rewrite Flags page.

Temporary redirects due to maintenance

Whenever a website needs to undergo a major maintenance procedure that would hinder its normal workings and possibly throw errors if someone tries to access it, it is recommended that the website is put offline until maintenance is finished. Putting a website offline requires informing its visitors why they cannot access it or parts of it so they know they can come back later. This is important because some of those visitors may be search engine crawlers and unless they are told to come back later, they will record the 404 errors your previously indexed URLs throw at them.

The elements of a temporary maintenance redirect with Apache are as follows:

  • Customized maintenance message page
  • A .htaccess file with directives to redirect all incoming requests to the maintenance page and throw the necessary http status code

This is how the .htaccess might look like:

RewriteEngine On
RewriteCond %{REMOTE_ADDR} !^11\.111\.11\.111
RewriteRule ^(.*)$ maintenance.php [L]

The RewriteCond directive defines a condition under which the rewrite should be executed and it should be placed before the RewriteRule line. In our case we use the RewriteCond with the server variable %{REMOTE_ADDR} which represents the IP address from which the website is accessed. Thus our condition tells Apache to redirect all requests except the ones coming from IP 11.111.11.111 (this could be the IP address of the machine that is used for testing the website). All requests coming from other IP addresses are redirected to our maintenance.php page.

Another important element of the redirect is the http status code that has to be sent with the maintenance message. That’s because the maintenance message is important for people, but the search engine crawlers understand the http status code in the header. Basically there are two options from the list of http status codes:

  • 302 (moved temporarilly) – it basically tells the requestor that the server responds with a resource from a different location, but the original location should be kept for future access
  • 503 (service unavailable) – this tells the user agent that the server is unavailable due to overload or maintenance and it should come back later

I personally prefer the 503 status code because it describes better the essence of the maintenance state. The http status code can be sent via the header php function in the maintenance.php file. Example:

<?
header("HTTP/1.1 503 Service Unavailable");
echo "Server is down for maintenance. Please come back in an hour.";
?>
Permanent redirect due to changed location

Although frequent change of resource locations is not recommended, sometimes it is a needed action. In that case one needs to make sure that the user’s bookmarks of the old location will still work, and also inform the user agent that the change is permanent via an appropriate http status code. That http code is 301 (moved permanently). It tells the user agent to update its records of the location of that specific resource.
Let’s say we want to redirect all requests for http://somewebsite.com/old.php?var=value to http://somewebsite.com/new.php?newvar=value. Then the .htaccess file in this case might look like this:

RewriteEngine On
RewriteCond %{QUERY_STRING} var=(value)
RewriteRule ^old\.php$ http://somewebsite.com/new.php?newvar=%1 [R=301]

Again the RewriteCond directive is used, this time with the %{QUERY_STRING} variable which represents the GET arguments passed via the URL. This way any request to the /old.php?var=value will be redirected to /new.php?newvar=value. That, however, is a little stiff redirect since it will match only one specific variable name and value. We can make the redirect a lot more flexible using a short but powerful regular expression:

RewriteEngine On
RewriteCond %{QUERY_STRING} ^[^=]+([^&]+) [OR] !%{QUERY_STRING}
RewriteRule ^old\.php$ http://somewebsite.com/new.php?newvar=%1 [R=301]

This RewriteCond will match any call to the old.php page regardless of the name and value of the arguments passed. It will also match a call to the old.php page without any GET arguments supplied.
The flag argument of the RewriteRule directive is [R=301]. The R flag (“R” for redirect) redirects the request with the supplied http status code.

Additional information:

No Comment.

Add Your Comment

Your Comment