TechTip: Be a Web Wiz

Web Languages
Typography
  • Smaller Small Medium Big Bigger
  • Default Helvetica Segoe Georgia Times

The Apache Web server, included free on iSeries machines, contains a powerful feature, known as mod_rewrite, that can convert URLs from their original versions (as requested by a Web browser or other client) to any format you find more useful.

This article offers a small taste of what URL Rewriting can do. The possibilities are limitless. The solutions can get complex, too!

Note: Readers are assumed to have some familiarity with Apache.

What It Can Do

URL Rewriting helps make your Web sites and applications more secure and more accessible to users, other applications, and search engines. It allows these improvements without forcing you to change your site or application.

Essential Directives

These directives go in your configuration file (httpd.conf), which you might edit using the iSeries Web-based Web administrator.

  • RewriteEngine—Tells Apache whether you wish to use rewriting. Turn rewriting on with the RewriteEngine On directive.
  • RewriteCond—An optional directive that restricts the execution of any directives that follow consecutively. Its syntax is RewriteCond TestString CondPattern, where TestString is the string or variable to test and CondPattern is a regular expression (powerful search-and-replace string) that represents the test to perform.
  • RewriteRule—The workhorse of rewriting. Its syntax is RewriteRule Pattern Substitution, where Pattern is a regular expression to match the incoming URL and Substitution is the resulting URL you want.
     

Enhance Security

URL Rewriting can enhance security in many ways, such as showing the public an architecture that hides your server's true directory structure. Another security measure is to require that all users access the site using SSL encryption. Here is how we can enforce SSL encryption:

RewriteEngine On
RewriteCond %{SERVER_PORT} !^443$
RewriteRule ^/(.*) https://%{SERVER_NAME}/$1 [NC,R,L]

If the server port is not 443 (the port normally used for SSL encryption, represented in the browser by the "https" prefix), we run a rewrite rule that redirects the browser to the same site but with an "https" prefix. The RewriteRule takes any path information (matched by the wildcard "(.*)"), substitutes it into the result by the symbol /$1, and prefixes it with "https" and the server name. The bracketed options mean the following: NC=not case sensitive; R=ask the browser to redirect to the new URL; L=last request (don't execute any more rewriting rules for the current request).

Example: The original URL is http://www.mytestsite.com. Apache redirects to https://www.mytestsite.com (notice the "s" in "https").

Simplify the URL of Your Home Page

The URL of a dynamically generated home page can be complex. Some software tools require several parameters. This example is from a major retailer's Web site, its name disguised:
http://www.rdfrederick.com/cgibin/xyzweb?procfun+homeproc01+pghome+rdf+eng.

We should be able to reach the home page by a simple domain name (e.g., http://www.rdfrederick.com). The usual solution is to create a "dummy" home page, reached at the domain name, that uses JavaScript or metatags to redirect to the dynamic page. The "redirection" approach is slow and awkward. URL rewriting provides a better answer.

RewriteEngine On
RewriteRule ^/$ /cgi-bin/xyzweb?procfun+homeproc01+pghome+rdf+eng [PT,L]

The ^/$ indicates an empty string. The rule finds a match when a simple domain name is used, without any further path or file data. The rule, having been matched, will substitute the second parameter (/cgi-bin...). Inside the brackets, there is no "R," so no redirection takes place. The substitution of the longer URL occurs inside the Web server. Although the proper program (xyzweb with parameters) is called, the user's browser just shows http://www.rdfrederick.com. Note: the "PT" ("Pass through") inside the brackets is important; it passes the rewritten result through to any other processing that the Web server might have to do.

Fit a Long URL on a Short Screen

The Client Access 5250 emulator provides an easy way to integrate Internet content, such as Web pages and images, with text-based 5250 screens. By default, Client Access recognizes when a URL is displayed, converting it to a clickable link. Clicking a link launches the associated content in the default Web browser. One problem: If the URL is longer than the screen width, which by default is 80 characters (or a 24 x 80 screen), some of the URL will be cut off.

For example, our Web-based invoice software could require a long URL that looks like this:
http://www.myinvsite.com/qsys.lib/wwwcgi.lib/softweb.pgm?procfun+myproc+func001+ dev+eng+funcparms+stdrentry(A0010):Y+account(A0100):12345+ invoice(A0050):22222+line(A0060):43.

That's a mouthful! We can reduce it to this dainty (and more readable) URL:
http://www.myinvsite.com/account=12345/invoice=22222/line=43

The conversion is managed with the following directives:

 

RewriteEngine On
RewriteRule /account=(.*)/invoice=(.*)/line=(.*) /qsys.lib/wwwcgi.lib/softweb.pgm?procfun+myproc+func001+dev+eng+funcpar

ms+stdrentry(A0010):Y+account(A0100):$1+invoice(A0050):$2+line(A0060):$3 [PT,L]

Notice the three wildcards "(.*)", which are saved and substituted for the "$1," "$2," and "$3" symbols in the replacement URL. Apache pulls the three values out of the original URL and places them in the replacement URL. The user and Client Access see the short URL, while the Web server processes the long one.

Incidentally, search engines seem to prefer simple URLs over complex ones. A site with long, complex URLs might improve its search engine rankings by simplifying its URLs using this technique.

More Ideas and Information

Many inspiring "Practical Solutions" can be found in the URL Rewriting Guide. The study of regular expressions will aid the aspiring Web wizard, as will this tutorial and the official mod_rewrite documentation.

Readers who have questions, comments, or suggestions about URL Rewriting are encouraged to post comments into the forums discussion located at the end of this article. Share your own wizardry if you like.


 

BLOG COMMENTS POWERED BY DISQUS