spambayes.sourceforge.net Open in urlscan Pro
2606:4700:4400::6812:22f3  Public Scan

Submitted URL: http://spambayes.sourceforge.net/
Effective URL: https://spambayes.sourceforge.net/
Submission: On October 04 via manual from US — Scanned from DE

Form analysis 0 forms found in the DOM

Text Content

  

SF Project Page Frequently Asked Questions      

About the Project Home page Background Documentation Applications Reviews/Quotes
Developers Related Donate   Platforms Windows Unix/Linux Mac OS   Getting the
code Releases Subversion access   Contact Us Contact details  


  



NEWS

SpamBayes 1.1a6 is now available! (This includes both the source archives and a
Windows binary installer).

See the download page for more info or select an appropriate version from the
table below.

You may also like to see what other people have been saying about us in the
press and elsewhere.


DOWNLOAD SPAMBAYES

Locate the row which contains your operating system and mail program to see
which version of SpamBayes is right for you. If you can test any of the
configurations, please let us know. Note that installing a source release is
more involved than the binary releases.

Please try the test releases if at all possible. While they are still labelled
as "alpha", they are really quite stable. We're just extremely conversative/lazy
about doing beta/final releases. ;-)

Mail Program Operating System Stable Release Test Release Notes Outlook 2010
Windows 7 1.1a6 1.1a6 A number of people have reported that SpamBayes will not
work on Windows 7 unless you install it in a non-default location. Installing in
C:\SpamBayes should do the trick. Outlook 2000/2003/ 2007/2010 Windows XP/Vista
1.1a6 1.1a6 If some combination of Windows and Outlook versions isn't listed,
please give it a try anyway... 64-bit support is currently problematic. Windows
3.1/ME are not supported, nor are Outlook versions < 2000. Outlook Express
Windows XP 1.1a6 1.1a6   Windows Live Mail Windows XP/Vista/7 1.1a6 1.1a6
Untested. It should work at least as well as Outlook Express though. Other
Windows XP/Vista/7   1.1a6 (binary)
1.1a6 (source) Source code for the rest of the Windows world. (Binary installer
also - might work in some circumstances.) IncrediMail Windows XP/Vista/7  
1.1a6 (source) Completely untested. May work with POP3. Rudimentary IMAP support
may prevent that protocol. Thunderbird 2.x Any   Thunderbayes Tighter
integration similar to Outlook plugin. Uses 1.0.4. Thunderbird 3.x Any  
Thunderbayes++ ThunderBayes extended to work with Thunderbird 3.x (Thunderbird 6
is not yet supported.) Thunderbird 6.x Any   1.1a6 (source) Thunderbayes is not
yet supported on Thunderbird 6 Gmail Any   1.1a6 (source) POP3 works. IMAP
probably not. Yahoo! Mail Any   1.1a6 (source) Completely untested. POP3 & IMAP?
MSN Hotmail Any   1.1a6 (source) Completely untested. POP3 & IMAP? AOL Mail Any
  1.1a6 (source) Completely untested. IMAP only? Any Mac OS X/Linux/Solaris  
1.1a6 (source) POP3/IMAP, etc. Be prepared to get your hands wet...


WHAT IS SPAMBAYES?

The SpamBayes project is working on developing a statistical (commonly, although
a little inaccurately, referred to as Bayesian) anti-spam filter, initially
based on the work of Paul Graham. The major difference between this and other,
similar projects is the emphasis on testing newer approaches to scoring
messages. While most anti-spam projects are still working with the original
graham algorithm, we found that a number of alternate methods yielded a more
useful response. This is documented on the background page.

SpamBayes is not a single application. The core code is a message classifier,
however there are several applications available as part of the SpamBayes
project which use the classifier in specific contexts. For the most part, the
current crop of applications all operate on the client side of things, however,
a number of people have experimented with using SpamBayes on mail servers to
classify incoming mail for multiple users. The table below outlines the main
applications which are part of the SpamBayes distribution.

Application Description Outlook Plugin A plugin for Microsoft Outlook which
tightly integrates classification and training into the Outlook interface
Pop3proxy / sb_server A mail filter which sits between the user's POP3 server(s)
and the user's mail client and presents a web-based training interface
Imapfilter A mail filter similar to pop3proxy but which talks the IMAP protocol
Hammiefilter / sb_filter A simple mail filter suitable for embedding in a
procmail environment


THAT'S GREAT, BUT WHAT'S SPAMBAYES?


(THE NON-TECHNICAL HAND-WAVING ANSWER)

SpamBayes will attempt to classify incoming email messages as 'spam', 'ham'
(good, non-spam email) or 'unsure'. This means you can have spam or unsure
messages automatically filed away in a different mail folder, where it won't
interrupt your email reading. First SpamBayes must be trained by each user to
identify spam and ham. Essentially, you show SpamBayes a pile of email that you
like (ham) and a pile you don't like (spam). SpamBayes will then analyze the
piles for clues as to what makes the spam and ham different. For example;
different words, differences in the mailer headers and content style. The system
then uses these clues to examine new messages.

For instance, the word "Nigeria" appears often in spam, so you could use a spam
filter which identifies anything with that word in it as spam. But what if your
business involves writing a guidebook on Nigerian Wildlife Conservation? Clearly
a more flexible approach is necessary. Additionally spammers will adapt their
content over time and will no longer use the word "Nigeria" (or the words "Lose
Weight Fast", or any number of other common lines). Ideally the software will be
able to adapt as the spam changes.

So, that is what SpamBayes does. It compares the spam and the ham and calculates
probabilities. For instance, for me, the word "weight" almost never occurs in
legitimate email, but it occurs all the time in 'lose weight fast' spam.
SpamBayes can then look at incoming email, extract the most significant clues
and combine the probabilities to produce an overall rating of "spamminess". It
flags the messages so that your mailer can handle the different message types.
You might set it up so that ham goes straight through untouched, spam goes to a
folder that you ignore (or delete without checking) and the unsure messages go
to another folder which you can review for errors.


HOW IS SPAMBAYES DIFFERENT?

There are a number of similar projects to SpamBayes - most are just using the
original Paul Graham algorithm. Examining the Graham technique with careful
testing showed that it did a remarkably good job, but there was considerable
room for improvement. (See the background page for more.)

The SpamBayes team tinkered with new algorithms, tweaking existing algorithms,
and, most importantly, did enormous test runs, slamming tens of thousands of
messages against each other, in an attempt to quantify whether or not a change
to the system was beneficial.

The new algorithm is a combination of work from Gary Robinson and Tim Peters,
and provides not just a 'spam' and 'ham' rating, but also an 'unsure' rating,
for those messages where it can't work out how to rate the message.

See the background page for more, well, background.

The code (implemented in Python) is currently available from a variety of
methods from the downloads page.

There are now a couple of end-user applications available for those excited by
the bleeding edge - these are detailed on the Applications page, and available
as part of the source download.


CREDITS

Most of the heavy lifting on this project was done by Tim Peters, with the cast
of spambayes obsessive-compulsives providing ideas, heckling, and testing. Gary
Robinson provided a lot of the serious maths and theory, as well as his essay on
"how to do it better" (see the background page for a link). Rob Hooft also
contributed maths/stats clues. Mark Hammond amazed the world with the
Outlook2000 plug-in (with Tony Meyer, Sean True, and Adam Walker making
significant contributions), and Richie Hindle, Neale Pickett, Tim Stone worked
on the end-user applications.

(Thanks also to Rachel Holkner for turning Anthony's gibberish into something
closer to actual English, although all mistakes are Anthony's.)