Become a Web wizard with Apache Web server's URL Rewriting, conjuring any links you like.
The
Apache Web server, included free on iSeries machines, contains a powerful
feature, known as mod_rewrite, that can convert URLs from their original
versions (as requested by a Web browser or other client) to any format you find
more useful.
This article offers a small taste of what URL Rewriting can
do. The possibilities are limitless. The solutions can get complex, too!
Note: Readers are assumed to have some familiarity with Apache.
What It Can Do
URL Rewriting helps make your Web sites and
applications more secure and more accessible to users, other applications, and
search engines. It allows these improvements without forcing you to change your
site or application.
Essential Directives
These directives go in your configuration file
(httpd.conf), which you might edit using the iSeries Web-based Web
administrator.
- RewriteEngine—Tells
Apache whether you wish to use rewriting. Turn rewriting on with the
RewriteEngine On
directive.
- RewriteCond—An
optional directive that restricts the execution of any directives that follow
consecutively. Its syntax is RewriteCond
TestString CondPattern, where
TestString is the string or
variable to test and CondPattern
is a regular expression
(powerful search-and-replace string) that represents the test to perform.
- RewriteRule—The
workhorse of rewriting. Its syntax is RewriteRule
Pattern Substitution, where
Pattern is a regular
expression to match the incoming URL and
Substitution is the
resulting URL you want.
Enhance Security
URL Rewriting can enhance security in many ways, such
as showing the public an architecture that hides your server's true directory
structure. Another security measure is to require that all users access the site
using SSL encryption. Here is how we can enforce SSL encryption:
RewriteEngine
On RewriteCond %{SERVER_PORT}
!^443$ RewriteRule ^/(.*)
https://%{SERVER_NAME}/$1 [NC,R,L]
If the server port is not 443
(the port normally used for SSL encryption, represented in the browser by the
"https" prefix), we run a rewrite rule that redirects the browser to the same
site but with an "https" prefix. The RewriteRule takes any path information
(matched by the wildcard "(.*)"), substitutes it into the result by the symbol
/$1, and prefixes it with "https" and the server name. The bracketed options
mean the following: NC=not case sensitive; R=ask the browser to redirect to the
new URL; L=last request (don't execute any more rewriting rules for the current
request).
Example: The original URL is http://www.mytestsite.com. Apache
redirects to https://www.mytestsite.com (notice the "s" in "https").
Simplify the URL of Your Home Page
The URL of a dynamically generated home page can be
complex. Some software tools require several parameters. This example is from a
major retailer's Web site, its name disguised:
http://www.rdfrederick.com/cgibin/xyzweb?procfun+homeproc01+pghome+rdf+eng.
We
should be able to reach the home page by a simple domain name (e.g.,
http://www.rdfrederick.com). The usual solution is to create a "dummy" home
page, reached at the domain name, that uses JavaScript or metatags to redirect
to the dynamic page. The "redirection" approach is slow and awkward. URL
rewriting provides a better answer.
RewriteEngine On
RewriteRule ^/$
/cgi-bin/xyzweb?procfun+homeproc01+pghome+rdf+eng [PT,L]
The
^/$ indicates an empty string. The rule
finds a match when a simple domain name is used, without any further path or
file data. The rule, having been matched, will substitute the second parameter
(/cgi-bin...). Inside the brackets,
there is no "R," so no redirection takes place. The substitution of the longer
URL occurs inside the Web server. Although the proper program (xyzweb with
parameters) is called, the user's browser just shows http://www.rdfrederick.com.
Note: the "PT" ("Pass through") inside the brackets is important; it passes the
rewritten result through to any other processing that the Web server might have
to do.
Fit a Long URL on a Short Screen
The Client Access 5250 emulator provides an easy way
to integrate Internet content, such as Web pages and images, with text-based
5250 screens. By default, Client Access recognizes when a URL is displayed,
converting it to a clickable link. Clicking a link launches the associated
content in the default Web browser. One problem: If the URL is longer than the
screen width, which by default is 80 characters (or a 24 x 80 screen), some of
the URL will be cut off.
For example, our Web-based invoice software
could require a long URL that looks like
this: http://www.myinvsite.com/qsys.lib/wwwcgi.lib/softweb.pgm?procfun+myproc+func001+
dev+eng+funcparms+stdrentry(A0010):Y+account(A0100):12345+
invoice(A0050):22222+line(A0060):43.
That's
a mouthful! We can reduce it to this dainty (and more readable)
URL: http://www.myinvsite.com/account=12345/invoice=22222/line=43
The
conversion is managed with the following directives:
RewriteEngine On
RewriteRule
/account=(.*)/invoice=(.*)/line=(.*)
/qsys.lib/wwwcgi.lib/softweb.pgm?procfun+myproc+func001+dev+eng+funcpar
ms+stdrentry(A0010):Y+account(A0100):$1+invoice(A0050):$2+line(A0060):$3
[PT,L]
Notice the three wildcards "(.*)", which are saved and
substituted for the "$1," "$2," and "$3" symbols in the replacement URL. Apache
pulls the three values out of the original URL and places them in the
replacement URL. The user and Client Access see the short URL, while the Web
server processes the long one. Incidentally, search engines seem to
prefer simple URLs over complex ones. A site with long, complex URLs might
improve its search engine rankings by simplifying its URLs using this
technique.
More Ideas and Information
Many inspiring "Practical Solutions" can be found in
the URL
Rewriting Guide. The study of regular expressions will aid the
aspiring Web wizard, as will this
tutorial and the official
mod_rewrite documentation.
Readers who have questions, comments, or
suggestions about URL Rewriting are encouraged to post comments into the forums
discussion located at the end of this article. Share your own wizardry if you
like.
Alan Seiden is Senior Developer and
Technical Lead at Strategic Business
Systems, Inc., in Ramsey, New Jersey, where he helps
clients reach their business goals using iSeries, Microsoft, and open-source
technology, with an emphasis on usability. Alan is an advisory board member of
the New York City Usability Professionals
Association. Contact Alan at
This e-mail address is being protected from spam bots, you need JavaScript enabled to view it
. |