The Linux Letter: Frying Spam

Linux / Open Source
Typography
  • Smaller Small Medium Big Bigger
  • Default Helvetica Segoe Georgia Times

Spam. Also known as unsolicited commercial email (UCE). We all hate it and despise those who create it, but as sys admins, we're all forced to contend with it. I receive more than 2,000 email messages every day (mostly from mailing lists), and my statistics show that fully 25 percent of them are spam. That's not surprising, since my email address appears on public Web sites and in USENET newsgroups, where it has undoubtedly been harvested many times over. Furthermore, my address has been the same since AT&T purchased IBM's Internet business, which makes it at least a decade old, so who knows how many spamming lists it appears on.

When the spammers started filling my mailbox, I was determined not to change my address (the standard method many use to curtail spam) but to instead investigate means to cut it off at the pass. Most of you already have anti-spam measures in place, but for those who don't or for those who want additional protection, I'm going to describe some basic theory and some open-source tools you can employ to build a spam/virus filtering appliance. In fact, some commercial appliances use the techniques and open-source software that I'm going to describe herein, but they hide that fact behind a custom interface. You can either purchase one of these or cut out the middleman and roll your own. Even if you decide not to build an appliance, you may garner some ideas that can be applied to your existing email scrubber to enhance its effectiveness. Let's dive in.

Bare Bones First

I build all of my mail servers and smart host appliances (which sit in front of your mail servers) with Red Hat Enterprise Linux or CentOS, a Red Hat derivative. I chose these two because I want my systems to be built for an extended lifetime, which both of these provide with long-term support, and I want popular distributions so that I can readily find pre-built software packages bundled in RPM format. If you're a Novell shop, you will find similar capabilities with SuSE Linux. This isn't to say that any of the other multitudes of Linux distros (or *BSDs) aren't capable, only that, for me, an enterprise-quality Linux distro is preferred.

As for the mail transport agent (MTA), I eschew Sendmail and instead use Postfix. Both can be loaded during installation, and by using the system-switch-mail command, I can configure the system to use Postfix. Without getting into a great religious argument, I chose Postfix because its author, Wietse Venema, successfully designed Postfix to be extremely secure and extensible. Postfix has UCE controls built in, and that Wietse chose to use plain-text configuration files gave him extra points during my selection, too. Sendmail gurus are equally adept at doing what I propose using their favorite MTA and ancillary software, but if you compare a Sendmail configuration file with a Postfix one, you'll quickly see why I chose the latter.

There is plenty of documentation on how to install Linux as well as how to do a basic configuration of Postfix, so I won't repeat that information here. I would, however, encourage you to configure a bare bones Postfix instance and get it successfully accepting/sending and forwarding mail prior to tweaking it for sentry duty. Doing so will make your life so much easier.

Barbarians at the Gate

Having configured our basic Postfix server, we can now turn our attention to the customizations that will turn it into a high-performance, spam-eating machine.

The best way to fight spam is simply not to accept it in the first place, thus stopping the barbarians at the gate and minimizing the impact on your mail server. But how does one accomplish this? Separating spam from ham (the term for legitimate email) involves a delicate balancing act. On the one hand, you want to be as aggressive as possible in eliminating spam so that your users don't have to wade through excessive amounts, yet on the other hand, you don't want ham messages getting erroneously rejected, thus potentially costing you business. Fortunately, many spammers make it easy to tip the balance in our favor by ignoring the specifications laid out in the Internet Request for Comments (RFC) RFC 2821 - Simple Mail Transfer Protocol. (For those unaware, the internet RFCs are the blueprint for the protocols that make the Internet go. You can do a quick Google search for further information).

Try Again Later

We all know that the email system was designed to be forgiving of unreachable servers and that an email sent today may be held by your MTA until it can be delivered. The first line of defense that I implement is called "greylisting." For that, I use a package called Postgrey. I'm sure that everyone is familiar with the concepts of blacklisting (if you're on the list, I don't accept your email) and whitelisting (if you're on the list, I accept your email). Greylisting is a technique that takes advantage of the fact that a properly configured MTA will make multiple attempts to deliver an email before returning it as undeliverable.

The process is very simple. An MTA connects to my Postfix instance and starts the transmission, giving Postfix the email header information. Postfix hands this information off to Postgrey, which checks to see if it has ever seen the triplet client_ip/sender/recipient before. If it has, or if the sender or domain is Postgrey-whitelisted, then Postgrey returns an "Okay" message back to Postfix, instructing it to accept the message. If the triplet is new, Postgrey will cache the triplet along with a time stamp and then tell Postfix to reject the message with a temporary failure. Postgrey will continue to reject the message until the admin-configurable length of time has passed. A legitimate MTA will hold the email and try again later, but an illegitimate (or improperly configured) MTA will not. This is the case with the majority of home users' computers that have been turned into spam-sending zombies. The simple MTA running on these machines will simply move along to the next victim on the list, leaving your system alone.

What's nice is that Postgrey will automatically whitelist email addresses that it frequently sees, so messages from people with whom you correspond regularly will be delivered without delay. Should such a person not contact you for an extended (user-configurable) period of time, that person will be removed from your whitelist. You also may add users or domains to a permanent whitelist so that mail from them will be accepted immediately, if you so desire.

I'm sure that as this technique gains in popularity, the zombie writers will make their programs more sophisticated so that they act like real MTAs (as required by the RFC). Until that happens, I revel in the relative calm that greylisting has given me.

RFC Violations

If the SMTP server trying to deliver spam to my server is persistent enough to get past Postgrey, it still has some hoops to jump through before the message will be delivered. Postfix has a bevy of anti-UCE capabilities that can easily be switched on under the configuration option "smtpd_recipient_restrictions." Through their use, I can ensure that Postfix will thoroughly inspect the transaction between the two servers and that the transaction, and subsequent envelope information, doesn't "smell fishy." To that end, I configure Postfix to ensure...

  • that the envelope sender and recipient (From and To) information contains fully qualified domain names. That allows it to verify that the recipient is a valid user for the domain(s) that we serve and that if there is a problem, there is a way to bounce the message back to the sender. This is done via the "reject_non_fqdn_recipient" and "reject_non_fqdn_sender" arguments.
  • that the domain names provided are valid, through a DNS lookup. This is done via the arguments "reject_unknown_sender_domain" and "reject_unknown_recipient_domain." A spammer can use bogus domain names (and many do) to pass the first test. This one at least requires them to make an effort to use real domain names.
  • that any client connecting provides a fully qualified and valid host name using the "reject_non_fqdn_hostname" and "reject_invalid_hostname" arguments. While I can't rely on the host name having a DNS record (thus thwarting any attempt to validate the host), an absence of the host name could be indicative of a spambot, or an improperly configured mailer. The same applies to a host name that contains invalid characters. Since implementing the anti-UCE measures, I've had only one instance of legitimate mail being blocked because of a lack of a host name. I had a chat with that sender's system administrator, who fixed his configuration.

Most SMTP servers (and by default, Postfix) are tolerant of some garbage envelope entries as long as they can deduce what was requested. In addition to the aforementioned RFC requirements, I made the decision to enable the Postfix "strict_rfc821_envelopes" switch, which makes it intolerant of mailers whose email envelope (From, To, etc.) information doesn't strictly conform to the RFC. The Postfix documentation warns that many mailers produce junk envelopes and that enabling strict envelopes will cause their mail to get rejected. I considered the implications and decided that I shouldn't have to tolerate junk mail as a result of poor design, thus my decision to adopt this draconian policy. I have yet to have any complaints from my users. My guess is that as a result of the UCE scourge, I'm not the only one taking this posture, thus authors of email software are finally toeing the line and following the specs.

It's amazing to me how many spam messages get blocked using just these techniques. I extracted a couple of hours worth of email logs and did some quick analysis. Of the 25,258 entries, 515 were rejection messages. Of those, 283 were greylist rejections, and 139 were rejected because of RFC violations. Some of the sources for the remainder are discussed later.

Real-Time Black Listing

Even if the email envelope conforms to the RFCs, it doesn't mean that I'm willing to accept the message. Many emails pass the RFC tests but are delivered by known spam hosts and can therefore be rejected. The use of blacklists can be controversial (at least for those who have ended up on the list), but I find the one run by the SpamHaus Project to be very good.

Configuring Postfix to avail itself of those resources is simple. Just add another argument, "reject_rbl_client sbl-xbl.spamhaus.org," to the "smtpd_recipient_restrictions" directive, and Postfix will check the inbound server's IP against the list. If it's found, the message will be rejected. If not, the message will pass unmolested. All of this for the cost of a single line in my configuration file and one DNS lookup. Since I also run a caching DNS server on my spam appliances, subsequent attempts to send mail from the same IP will be rejected without any DNS requests even leaving the box. In my impromptu statistics, 98 messages were rejected because they were coming from known spammers.

That Personal Touch

Postfix allows you to use regular expressions and hash tables to further define what is acceptable or not. In other words, you can fine-tune Postfix's behavior to your company's requirements. For example, you may not want some or all of these rules to apply if you're sending outbound or intra-office mail. For instance, you may not want to spend the CPU cycles and network bandwidth to do DNS lookups if the email is outbound.

Furthermore, you may want to inspect subject lines or domain names and cut messages off at the pass if you find them objectionable. As an example, we have a couple of companies that continue to send us unwanted email, in spite of being notified that it's unwanted. Sure, there's the CAN-SPAM (pronounced Can Spam!) act that I suppose I could invoke to get the mail stopped, but quite frankly, it's much easier to add one line to a config file than it is to jump through hoops to get the government to take action.

There's More

Much of this article touched on Postfix's UCE control options, but there's more. As I earlier mentioned, Wietse did a great job designing this MTA to be extensible, so you can add functionality to it.

Even with all of the controls I have in place, the spam still sometimes gets through. To combat it, I took advantage of the extensibility. Once Postfix has given its blessing to an email, it pipes it to DSPAM (another open-source project), which categorizes the mail as ham or spam. DSPAM was one of the first to use heuristics to categorize spam. It doesn't simply look at headers or content for obvious giveaway terms but, instead, looks at all of the terms and their relationships. DSPAM learns what to look for based on the initial training you give it (with a corpus of ham and spam) and by subsequent corrections you provide to it.

Once the message has been accepted and categorized, it has one final trial: It's piped through two anti-virus packages (F-Prot and the open-source ClamAV) before finally being delivered to the user. You can customize this ad infinitum, both from within Postfix and with external programs, providing as much content-filtering as your situation requires.

A Great Payoff

I have been very pleased with the performance of our email system. The cost to provide all of this functionality was minimal—and certainly less than the cost of a commercial appliance. What is more important to me than the initial cost, however, is my ability to reconfigure and tweak this thing as our needs change. Right now, the system we're running is more than enough to handle our load, and the nature of all of the software makes it possible to distribute the workload across many servers, if need be.

You're welcome to visit all of the links I've provided to see what you're getting into. For those who want to hop into the express line, I recommend The Book of Postfix: State-of-the-Art Message Transport by Ralf Hildebrandt and Patrick Koetter. It contains all the information you'll need, with much of it in cookbook form.

If you want a fun project with a great payoff to your company, I'd suggest that you consider building one of these appliances. You'll learn quite a bit about how the Internet email system is supposed to work, and you'll end up with a versatile system that will support your electronic correspondence needs for the long-term. Enjoy!

Barry L. Kline is a consultant and has been developing software on various DEC and IBM midrange platforms for over 23 years. Barry discovered Linux back in the days when it was necessary to download diskette images and source code from the Internet. Since then, he has installed Linux on hundreds of machines, where it functions as servers and workstations in iSeries and Windows networks. He co-authored the book Understanding Linux Web Hosting with Don Denoncourt. Barry can be reached at This email address is being protected from spambots. You need JavaScript enabled to view it..

BLOG COMMENTS POWERED BY DISQUS