TechTip: Be a Web Wiz PDF Print E-mail
Written by Alan Seiden   
Thursday, 19 January 2006

Become a Web wizard with Apache Web server's URL Rewriting, conjuring any links you like.

The Apache Web server, included free on iSeries machines, contains a powerful feature, known as mod_rewrite, that can convert URLs from their original versions (as requested by a Web browser or other client) to any format you find more useful.

This article offers a small taste of what URL Rewriting can do. The possibilities are limitless. The solutions can get complex, too!

Note: Readers are assumed to have some familiarity with Apache.

What It Can Do

URL Rewriting helps make your Web sites and applications more secure and more accessible to users, other applications, and search engines. It allows these improvements without forcing you to change your site or application.

Essential Directives

These directives go in your configuration file (httpd.conf), which you might edit using the iSeries Web-based Web administrator.

  • RewriteEngine—Tells Apache whether you wish to use rewriting. Turn rewriting on with the RewriteEngine On directive.
  • RewriteCond—An optional directive that restricts the execution of any directives that follow consecutively. Its syntax is RewriteCond TestString CondPattern, where TestString is the string or variable to test and CondPattern is a regular expression (powerful search-and-replace string) that represents the test to perform.
  • RewriteRule—The workhorse of rewriting. Its syntax is RewriteRule Pattern Substitution, where Pattern is a regular expression to match the incoming URL and Substitution is the resulting URL you want.
     

Enhance Security

URL Rewriting can enhance security in many ways, such as showing the public an architecture that hides your server's true directory structure. Another security measure is to require that all users access the site using SSL encryption. Here is how we can enforce SSL encryption:

RewriteEngine On
RewriteCond %{SERVER_PORT} !^443$
RewriteRule ^/(.*) https://%{SERVER_NAME}/$1 [NC,R,L]

If the server port is not 443 (the port normally used for SSL encryption, represented in the browser by the "https" prefix), we run a rewrite rule that redirects the browser to the same site but with an "https" prefix. The RewriteRule takes any path information (matched by the wildcard "(.*)"), substitutes it into the result by the symbol /$1, and prefixes it with "https" and the server name. The bracketed options mean the following: NC=not case sensitive; R=ask the browser to redirect to the new URL; L=last request (don't execute any more rewriting rules for the current request).

Example: The original URL is http://www.mytestsite.com. Apache redirects to https://www.mytestsite.com (notice the "s" in "https").

Simplify the URL of Your Home Page

The URL of a dynamically generated home page can be complex. Some software tools require several parameters. This example is from a major retailer's Web site, its name disguised:
http://www.rdfrederick.com/cgibin/xyzweb?procfun+homeproc01+pghome+rdf+eng.

We should be able to reach the home page by a simple domain name (e.g., http://www.rdfrederick.com). The usual solution is to create a "dummy" home page, reached at the domain name, that uses JavaScript or metatags to redirect to the dynamic page. The "redirection" approach is slow and awkward. URL rewriting provides a better answer.

RewriteEngine On
RewriteRule ^/$ /cgi-bin/xyzweb?procfun+homeproc01+pghome+rdf+eng [PT,L]

The ^/$ indicates an empty string. The rule finds a match when a simple domain name is used, without any further path or file data. The rule, having been matched, will substitute the second parameter (/cgi-bin...). Inside the brackets, there is no "R," so no redirection takes place. The substitution of the longer URL occurs inside the Web server. Although the proper program (xyzweb with parameters) is called, the user's browser just shows http://www.rdfrederick.com. Note: the "PT" ("Pass through") inside the brackets is important; it passes the rewritten result through to any other processing that the Web server might have to do.

Fit a Long URL on a Short Screen

The Client Access 5250 emulator provides an easy way to integrate Internet content, such as Web pages and images, with text-based 5250 screens. By default, Client Access recognizes when a URL is displayed, converting it to a clickable link. Clicking a link launches the associated content in the default Web browser. One problem: If the URL is longer than the screen width, which by default is 80 characters (or a 24 x 80 screen), some of the URL will be cut off.

For example, our Web-based invoice software could require a long URL that looks like this:
http://www.myinvsite.com/qsys.lib/wwwcgi.lib/softweb.pgm?procfun+myproc+func001+ dev+eng+funcparms+stdrentry(A0010):Y+account(A0100):12345+ invoice(A0050):22222+line(A0060):43.

That's a mouthful! We can reduce it to this dainty (and more readable) URL:
http://www.myinvsite.com/account=12345/invoice=22222/line=43

The conversion is managed with the following directives:

 

RewriteEngine On
RewriteRule /account=(.*)/invoice=(.*)/line=(.*) /qsys.lib/wwwcgi.lib/softweb.pgm?procfun+myproc+func001+dev+eng+funcpar

ms+stdrentry(A0010):Y+account(A0100):$1+invoice(A0050):$2+line(A0060):$3 [PT,L]

Notice the three wildcards "(.*)", which are saved and substituted for the "$1," "$2," and "$3" symbols in the replacement URL. Apache pulls the three values out of the original URL and places them in the replacement URL. The user and Client Access see the short URL, while the Web server processes the long one.

Incidentally, search engines seem to prefer simple URLs over complex ones. A site with long, complex URLs might improve its search engine rankings by simplifying its URLs using this technique.

More Ideas and Information

Many inspiring "Practical Solutions" can be found in the URL Rewriting Guide. The study of regular expressions will aid the aspiring Web wizard, as will this tutorial and the official mod_rewrite documentation.

Readers who have questions, comments, or suggestions about URL Rewriting are encouraged to post comments into the forums discussion located at the end of this article. Share your own wizardry if you like.

Alan Seiden is Senior Developer and Technical Lead at Strategic Business Systems, Inc., in Ramsey, New Jersey, where he helps clients reach their business goals using iSeries, Microsoft, and open-source technology, with an emphasis on usability. Alan is an advisory board member of the New York City Usability Professionals Association. Contact Alan at This e-mail address is being protected from spam bots, you need JavaScript enabled to view it .


Last Updated ( Thursday, 19 January 2006 )
 
Discuss (6 posts)
Guest.Visitor
TechTip: Be a Web Wiz
Jan 20 2006 16:14:00
Hans, thanks for the technique. <p>As you showed, Apache can respond to the same URL request in different ways according to context. URL Rewriting can test numerous variables, including the referrer (as you did), the user agent (browser), and time of day. <p>Best regards, <BR>
Alan Seiden
#119810

H.Boldt
TechTip: Be a Web Wiz
Jan 20 2006 15:38:00
Here's one of my favorite uses of mod_rewrite. A lot of people were linking directly to images on my web site. This is a problem since such linking obscures the photos true source, as well as causing additional load on my monthly bandwidth allowance. <p>See the code sample below. What does this do? First, if there is no HTTP_REFERER, it simply accepts the request as is. Second, if the referrer is a page on my site, again, the request is accepted. Otherwise, if the request is to a jpeg or gif file in directory photos or thumbs, the request is converted to a silly small gif image which is very clearly not what the image thief wants. <p>Some image thieves notice what has happened, but interestingly, many seem oblivious. I recommend using an image that may well provide some embarassment to the thief. <p>Note that this won't stop someone from stealing the image and hosting it on another server. Nothing can effectively stop that. But it will mean that you don't suffer any additional bandwidth costs. <p>Cheers! <a href="http://www.boldts.net/">Hans</a> <p><!--mccodelink_begin--> <BR>
<!-- do not remove --> <BR>
<hr width=50 align=left><small><a href='http://www.mcpressonline.com/mc/showcode@@.6b337af7/4' target='_blank'>Code</a></small> <BR>
<!--mccodelink_end-->
#119809
Guest.Visitor
TechTip: Be a Web Wiz
Jan 20 2006 15:09:00
Matt, <p>You are on the right track. <p>Your RewriteCond directive will only act on requests where the port is NOT 443, i.e., not SSL. You did this to redirect non-SSL requests to SSL. For this reason, it's the wrong place to put the second RewriteRule directive (that you're using to redirect from port 443 to port 80, HTTP). You want that directive to run when the port IS already 443. <p>One solution is to add a second RewriteCond block that checks for requests from port 443. <p>I've pasted some code in for you to look at. Although I haven't tested it, it should give you the idea. <p>Let us know if this works for you. <p>Good luck, <BR>
Alan <p><!--mccodelink_begin--> <BR>
<!-- do not remove --> <BR>
<hr width=50 align=left><small><a href='http://www.mcpressonline.com/mc/showcode@@.6b337af7/3' target='_blank'>Code</a></small> <BR>
<!--mccodelink_end-->
#119808
mshea@javacity.com
TechTip: Be a Web Wiz
Jan 20 2006 13:13:00
Thanks for the great information. This is close to what I've been looking for. One thing I'm still having a hard time with is protecting only certain parts of my web site. I wish to protect certain folders (i.e. www.domain.com/admin, www.domain.com/cart, etc..). The article presented showed how to do that, but when I select to go back to the homepage, www.domain.com, the URL stays in https. I've tried to have a rewrite rule that covers everything else but the particular folders I'm protecting, but then I receive error messages in my browser. <p>Any ideas are greatly appreciated. Matt <p><!--mccodelink_begin--> <BR>
<!-- do not remove --> <BR>
<hr width=50 align=left><small><a href='http://www.mcpressonline.com/mc/showcode@@.6b337af7/2' target='_blank'>Code</a></small> <BR>
<!--mccodelink_end-->
#119807
Guest.Visitor
TechTip: Be a Web Wiz
Jan 20 2006 12:44:00
The author welcomes readers' comments and questions about URL Rewriting. What parts of the article were helpful? Confusing? Where could URL Rewriting help you? <p>Thanks, <BR>
Alan Seiden
#119806
MC Press Web Site Staff
TechTip: Be a Web Wiz
Jan 20 2006 16:14:00
This is a discussion about <B>TechTip: Be a Web Wiz</b>.<p align='center'><a href=http://www.mcpressonline.com/mc?1@232.1KNKfHX1eQT.17@.6b32df2f>Click here for the article</a>.</p>
#119805


Discuss...
User Rating: / 0
PoorBest 
Related Articles
< Prev   Next >

The following White Papers can be found at the MC White Paper Center


The following trial software can be found at the MC Press Software Center.   



   MC-STORE.COM