Misplaced Caution About Open-Source Anti-Spam?

Given that the new open-source approaches to spam filtering are capable of virtually eliminating unwanted e-mail and preserving the good stuff, why do many companies continue to struggle with spam?

Jonathan Zdziarski, the developer of the DSPAM open-source Bayesian spam blocker, believes IT departments of most small- to medium-sized businesses are afraid to try free programs or meet resistance from higher-level company executives.

“Most mid-sized companies just pull an appliance off the shelf,” he said, noting there are “a million anti-spam companies out there with boxes loaded with a hodgepodge” of solutions. “That’s one of the reasons these businesses, if you ask them, are convinced spam filtering is ineffective … A lot of these companies are running technology that’s five to seven years old.”

Some popular, commercially-distributed solutions say they employ Bayesian filters. When used alone, as in DSPAM and other similar program, these filters use statistical analysis to yield incredibly accurate spam control.

However, Zdziarski, the author of a new book titled “Ending Spam,” asserts most of the time the Bayesian filtering in these “hybrid” commercial products, if present at all, is rendered virtually ineffective because it filters only the mail that finds its way through the commercial programs’ outdated “heuristic” filtering layer.

Good mail, or “ham,” that was improperly deemed to be suspicious by the heuristic filter may never reach the Bayesian filter layer. This prevents the filter from learning what makes good mail “hammy” and it further increases the application’s error rate.

The ability to tell good mail from spam is one of the most touted attributes of open-source spam-blocking programs using Bayesian statistical filtering as suggested by Paul Graham in “A Plan for Spam.” Anyone who’s ever absent-mindedly deleted an important e-mail that was improperly routed to a spam bucket can relate.

“False positives are innocent e-mails that get mistakenly identified as spams,” wrote Graham in his paper three years ago. “For most users, missing legitimate e-mail is an order of magnitude worse than receiving spam, so a filter that yields false positives is like an acne cure that carries a risk of death to the patient.”

Zdziarski contends the Bayesian element mentioned on the boxes and ads for most commercial spam blockers “is more of a marketing term, really, than any type of real component of the solution.” That’s because a company selling an adaptive Bayesian spam filter would have a tough time staying in business.

“A true, adaptive solution is like fine wine,” Zdziarski said. “You can take a tool like DSPAM … install it in a system, stick it in a closet and let it do its job with just basic user input. Let it sit there and run, without upgrades, for a couple years. When you take a look at it again, it will be performing better than it did on Day One.”

Systems administrators and others facing a decision about e-mail filtering must weigh the cost of using commercial products against the fact that statistical language classification filters, while free, work best if users are trained to help them out.

Employees need to cooperate by “teaching” the programs the difference between spam and ham, a simple task that gets easier over time as the programs gain knowledge.

IT people must also determine the company’s tolerance level for spam. Maybe 95 percent accuracy is good enough, even though it means up to five errors per hundred e-mails or 10 times more than would pass through a good statistical filter.

Zdziarski says he’s sometimes unnerved by his filter’s uncanny accuracy. For most systems administrators, the thought of employees opening spam containing viruses is something more scary, almost as bad as accidentally deleting that important e-mail from the CEO.

Tips-ITClick here for detailed comparison of anti-spam products (requires a fee to access)

Comment on this article
Be the first to comment on this article.
Managed Services
Channel Business
Distribution Channel
System Integration
IT Resellers
Software Vendors
Supply Chain Management
>” title=”>”/> Ziff Davis White Paper Library<br class=
>” title=”>”/> Baseline ROI Calculators<br class=
>” title=”>”/> eWEEK Labs RFPs and Tools
</div><!-- end ziffhtml //--></div></span></div></div><div style=
Add Channel News, Product Reviews, Trends and Analysis to your RSS newsreader or My Yahoo!
Join the Microsoft Empower program for ISVs.
A relationship with Microsoft allows you to compete with the big dogs. Learn more about resources, tools and training. Get the documentation >>
Free Hands-On Training Lab
Find out how key features of SBS 2003 can help you open up a new line of revenue. Register now >>
SBS 2003 Sales Reference Card
This handy reference card contains features at a glance, sales objection handling, pricing guidelines & more. Get it now >>
Microsoft Empower for ISVs rewards your big idea with big benefits and support.
Access key development tools at a low cost to help you develop that idea into an innovative application. Learn more >>
Attention Microsoft Solution Providers!

    The Upside of Success: When Portals Take Off
    Read how organizations choose between consolidating portal deployments on a single platform and using middleware to integrate existing solutions into a federated network.

    Business Value Patterns for SOA
    Read about three broad scenarios where organizations contemplating an SOA approach are most likely to quickly see benefits.

    A Case of Role-Based Identity
    Why are companies adopting role-based access control? Organizing users into roles based on responsibilities or job functions greatly simplifies user administration by allowing companies to use the structure of the business to map access to IT resources.
>> brought to you by Ziff Davis Media
Free Microsoft Watch Newsletter

Get Microsoft Watch’s FREE online newsletters. Fill-in the form below:

Each week you receive:
  • Microsoft News and Insider Information
  • Expert Analysis
  • Code Names of Upcoming MS Products
  • Year-Ahead Calendar, updated monthly
  • 1. Select email format:

  • 2. Enter Email Address