Return to Sender: 8 Easy Spam Filter FixesFalse positives are the scourge of spam filters. Whether you're sending or receiving, here are some steps you can take to keep good e-mail out of the slush pile.
Spam. It fills our in-boxes, wastes our time and spreads malware -- and it's only getting worse. According to Ferris Research, which studies messaging and content control, 40 trillion spam messages are expected to be sent in 2008, costing businesses more than $140 billion worldwide -- a significant increase from the 18 trillion spam messages sent in 2006 and the 30 trillion in 2007.
In theory, e-mail filtering software and appliances allow "good" or "true" e-mail messages to pass through while prohibiting spam. But the filters can err in either of two ways: They can mistakenly allow spam to pass through, believing it to be true e-mail (known as a "false negative" situation), or they can mistakenly block true e-mail, believing it to be spam (a "false positive").
Typically, after identifying a message as spam, the filtering software either blocks it outright or places it in a quarantine folder, allowing the recipient to review it later. Although the latter method provides a chance to retrieve false positives, it requires time and effort from the user -- and some users never bother to check their quarantine folders at all.
Users and organizations that receive spam incur a cost in deleting it -- about $.04 per message, according to Ferris Research. But Ferris analyst Richi Jennings points out that the cost to locate missing true e-mail is far greater than that of deleting spam -- about $3.50 per message.
(Ferris developed these figures using published data on such factors as labor size and hourly labor costs, then applied its own estimates, such as the percentage of workforces having e-mail access and volumes of spam messages. A downloadable spreadsheet [registration required] illustrates Ferris' model.)
Even worse, Jennings says, organizations incur potentially greater costs through missed opportunities because of false positives that they never see -- for example, a consulting firm that fails to receive a request for proposal.
To minimize the false positives caused by spam filters, it helps to know a bit about how they work. To keep up with ever more sophisticated spam, filters have used a variety of techniques over the years, often used in combination with one another. Here is a bird's-eye view of some popular techniques, in rough chronological order:
Keyword-Based and Bayesian Filters
The earliest filters searched a subject line and message body for particular words, such as " Viagra " or "online pharmacy." More sophisticated versions employ Bayesian analyses, which combine keyword searches with techniques such as determining ratios of "good" to "bad" words and assigning probability scores based on these ratios.
Unrecognized senders receive a reply asking them to validate themselves by supplying letters and characters that appear in images onscreen, a technique also known as CAPTCHA (completely automated public Turing test to tell computers and humans apart). This test is based on the idea that humans can detect and input certain patterns, while computers are unable to do so. Once a sender has been validated, his e-mail messages are sent straight through without the challenge step.
Blacklisting, Whitelisting and Reputation Listing
With these techniques, the filter evaluates not the message, but the characteristics of the sender, in particular the sender's previous record concerning spam.
Blacklists are databases that collect the IP addresses of known spammers from around the world. The spam filter checks incoming messages against the blacklist and refuses to accept e-mail from these addresses. Depending on the specific spam filter product used, the blacklist it checks could be local (i.e., maintained at a company's own network), remote (maintained centrally, independent of a specific company) or a combination. Some centrally maintained blacklists are publicly available, while others are fee-based services.
Whitelists collect the IP addresses of trusted e-mail sources on a "good sender" list, and the filter automatically accepts e-mail from those addresses. As with blacklists, a spam filter product could check a local whitelist, a centrally maintained one or both. Many spam filters make use of both blacklists and whitelists.
The term reputation service (or reputation list ) is sometimes used to refer to a technique that makes use of blacklists and whitelists but broadens them by considering not only the sending IP address, but the entire domain. However, the terminology is used inconsistently in the industry, with the terms "blacklist" and "reputation list" often used interchangeably.
In some cases, vendors use "reputation service" or "reputation list" to differentiate their lists from the community heritage of blacklists and whitelists. But Jennings cautions against buying into the idea that reputation lists are all run professionally, while all blacklists are "cesspools of false positives." He says he has yet to see a reputation list that's truly different from traditional blacklists or whitelists.
A recipient e-mail system temporarily rejects e-mail from an unknown IP address (one that's on neither a whitelist nor a blacklist). The recipient system sends an automated response to the sending system that informs the latter of the temporary failure. Theoretically, a "real" sender will simply resend the message, while a spammer will lack the patience to do so.
A service on the mail server slows down incoming connections as long as possible; the delay is meant to discourage spammers by forcing them to spend longer amounts of time to send their spam. The technique is named after real-life tarpits, into which animals get bogged down, eventually sinking and suffocating. (The original tarpitting program was called LaBrea, after the Los Angeles tarpit.) The disadvantage of this approach, of course, is that legitimate e-mail also takes longer.
Recurrent Pattern Detection
This proprietary technique relies on the fact that a spam outbreak, by definition, involves widespread distribution of e-mail. The RPD system, developed and maintained by security vendor Commtouch, monitors the Internet for such outbreaks and determines the patterns they contain, then updates a central database of spam patterns. (Commtouch both sells its own antispam products and licenses the RPD technology to other antispam vendors.) Company e-mail systems using RPD query the database, and e-mail identified as spam is discarded or quarantined.
Tips for Combating False Positives
On both the sending and the receiving end, minimizing false positives is critical for your organization. The real challenge comes from the fact that any or all of the spam-filtering techniques listed above may be employed on your own systems and on the systems of your recipients. Here are some steps you can take.
1. Do use a spam filter.
The occurrence of false positives can leave you wondering if you should simply toss your spam filter -- don't.
False positives can occur even without using a filter, such as when a user, seeing multiple spam subjects in an in-box, manually hits "delete" multiple times, not realizing that buried within that list is a "good" e-mail. A state-of-the-art spam filter, on the other hand, will catch 97% to 99% of spam, according to Ferris Research's Jennings, thus preventing the indiscriminate manual deleting scenario. And although spam filters can incur false positives, their rate of doing so is far lower (as low as.01%) than is incurred through pure human action, says Jennings.
2. Locate your filter at the network DMZ.
A demilitarized zone (DMZ) in the context of a computer network refers to a portion of that network that buffers the private internal network from the public Internet. The systems in the DMZ are vulnerable to attacks from the outside, but their presence protects the internal network from outside attacks.
Putting your spam filter at the DMZ, according to Jennings, allows it to monitor the characteristics of the connection and acquire more information about incoming e-mail messages, which can be critical to determining whether the message is spam or not. "If the sender is a Windows ME box," he says, "why would it be sending me e-mail directly, rather than going through a legitimate e-mail server? In that case, it's almost certainly a zombie, so the message is going to be spam."
3. Move away from older filtering technologies.
Michael Briggs, director of information technology at The George Washington University Law School, recommends moving away from old-fashioned keyword technologies in favor of newer techniques such as graylisting. (See " How CAPTCHA got trashed.")
In the same way, Jennings has strong concerns about challenge response systems, saying they're "simply a terrible idea." He points out that a legitimate sender might never see the challenge message, because that message itself could be flagged as spam, and because spammers often disguise spam as such a message.
4. Enlist your users to help maintain your whitelist.
Your users are constantly developing relationships with new clients, vendors and other contacts, which means that if you rely on a whitelist of trusted senders, it needs to be continually updated. Lucio Gonzalez, a system specialist and e-mail administrator at South Texas College in McAllen, appreciates it when employees at the college tell him about their new contacts -- for example, when the college gains new suppliers.
He adds them to his whitelist, and messages from these senders get through more quickly and don't risk being flagged as spam. Periodic reminders to your users to keep the IT department informed of new contacts will save everybody time and hassle.
Better yet, suggests Andrew Lochart, vice president of product marketing at e-mail security vendor Proofpoint, let users set their own spam filter parameters. In his words, spam, like beauty, lies in the eye of the beholder.
Although few people want the male enhancement or online pharmacy ads, some business travelers, for instance, might want their weekly notices from Delta Air Lines or Hertz. Such flexibility ultimately benefits both an end user and an e-mail administrator by reducing efforts by both of them to recover false positives.
5. Choose blacklists and reputation lists wisely.
If your organization relies on a blacklist or reputation list to stop spam, Jennings urges you to consider carefully which one to use. He points out that many spam filter products let the customer configure the product as to which blacklist, if any, to use.
When choosing a blacklist, Jennings recommends that you check the management policies of the lists. For example, some blacklists and reputation lists are driven purely by user complaints, says GWU's Briggs, and relying on them will invariably lead to false positives.
Not sure where to start? Ask your spam filter vendor for recommendations, suggests Jennings.
It's also important to keep up with the status of your blacklist or reputation list. Jennings cites the example of ORDB, a blacklist that was shut down in 2006, but which nonetheless still received queries from systems following the shutdown. These queries, according to Jennings, overwhelmed the servers that had housed ORDB, preventing the former ORDB administrators from doing other work. (In other words, the queries amounted to a denial-of-service attack, unintentional though it was.)
In early 2008, to stop these queries, the operators brought ORDB back online but set it up to flag every IP address reported to it as a spam source -- the only way, they believed, to gain the attention of e-mail administrators and get them to stop querying ORDB. Had these administrators been more alert to begin with, they would have investigated, discovered that ORDB was going away and redesigned their procedures accordingly, without requiring drastic measures from the ORDB operators.
6. Make sure you're not sending out spam.
If spam goes out from your systems, even unintentionally, it hurts your reputation and increases the likelihood you'll end up on spam blacklists. If you send enough spam, Jennings says, your reputation may suffer to the point that you will have trouble sending legitimate e-mail.
A three-pronged approach will help keep your reputation intact:
First, suggests Stephen Pao, vice president of product management at security vendor Barracuda Networks, curb your users' questionable Web browsing. If users visit dangerous or objectionable sites, malware from those sites could be installed on their computers, which could then be used to send spam from your systems. To prevent this, set clear acceptable-use policies and deploy Web monitoring or filtering software from a trusted security vendor such as Websense or Sophos on your users' systems. Note that monitoring your employees' Web use could involve legal and privacy issues, so be sure that you are complying with any applicable statutes, giving your users appropriate notice if necessary.
Along the same lines, Pao recommends that you stay up to date with security patches and virus and malware definitions to ensure that spammers can't take over your systems and use them to send spam.
Finally, Jennings recommends using outbound filtering to make absolutely sure no spam is being sent from your systems.
7. Check your own spam reputation.
If your organization is on a blacklist, your recipients might not receive your outgoing e-mail. For that reason, Lochart recommends regularly checking your own reputation. He suggests, for example, visiting habeas.com, a site that provides companies with a free reputation check and helps them otherwise manage their online reputations.
If you do find your company unjustifiably on such a list, Lochart suggests that you contact its administrator to voice your concerns. In some cases, though, a major league umpire would sooner reverse his called third strike than the administrator "un-blacklist" you. The inflexibility of such blacklists adds to the occurrence of false positives -- yet another reason to be careful when choosing your own blacklist or reputation list.
8. Warn your users to be wary of "red flag" words.
If you're waiting in line for security screening at Dulles Airport, you would be ill-advised to discuss bombs, weapons or hijacking. In the same way, Gonzalez recommends that in sending e-mail, you avoid those red flag words that are associated with spam.
Although keyword spam filtering is less desirable and less common these days, your recipient still might be using it. Therefore, if you can, avoid those words that might trigger a response from the filter, not all of which are intuitive. Gonzalez specifically mentions words such as "hey" or "hello." Other suspects include "free," "enlarge," "pharmacy," "alert" and "diploma."
Putting these words in a message or subject line may not automatically get your e-mail flagged as spam. Nonetheless, if you can avoid such words, so much the better.
Conversely, try to include recipient-specific information in your messages, such as project names or personal references unique to your recipient. Doing so can lessen the chance that Bayesian analysis of your message will cause it to be flagged.
By reducing false positives, you help ensure that real e-mail from your senders actually gets to you, and that real e-mail from you actually gets to them.
Calvin Sun consults with clients to improve their organizational effectiveness. He has written previously for Computerworld and for other major publications.
Calvin Sun, Computerworld