The Bayesian Plugin is a semi-intelligent solution for recognising Spam. Over
time, it learns from incoming email messages and the words they contain,
marking each word with a "spam" & a "clean"
probablility. It filters emails based on these probabilities producing an
overall score for the whole email.
2. Filtering Process
4. Plugin Window
Each word in the current email is added to the
of known clean/spam words (dependant on the
if(score > learningthreshold) total_number_of_spam_words++
< spamthreshold) total_number_of_clean_words++
spamratio = occurences / total_number_of_spam_words
cleanratio = occurences / total_number_of_clean_words
new_word_ratio = 100 * spamratio / (spamratio + cleanratio)
If the word has a score between
(not definitely spam and not definitely clean) it is not added to the
5. Configuration Options
Select "Plugins" -> "Bayesian Filter" from the "right-click" menu on the
SpamPal tray icon (The umbrella near the Taskbar Clock)
Each email that is processed by the plugin is copied here (only for the current
SpamPal session) so that they can be reclassified if necessary. A red icon
means the email was classified as spam, as green one means clean.
Functionality of the buttons:
Mark the currently selected emails as spam
Mark the currently selected emails as clean
Remove the currently selected emails
Remove all emails
Increasing this reduces the number of false classifications, decreasing it
makes the filter think more email should be tagged as spam. Any word with a
ratio below this threshold is considered a clean word.
Default value 90
Any word with a ratio greater than or equal to this is added to the database
classed as spam.
Default value 99
Limit message processing
Only process the first part of an email (see "Amount of message to
Amount of message to process (kb)
Limit the amount of an individual email that is processed. Set this to avoid
timeouts when the plugin can take too long when processing large emails.
The number of significant words examined during classification of an email.
Fewer words checked means the filter is more "trigger-happy", more words
checked would mean more spam words would be needed to be present for an email
to be classified as spam.
Default value 10
Min/Max word length
Set the minimum and maximum size of word that is used during filtering
Every word is tagged with the time it was last encountered. This threshold
ensures that words that haven't occurred recently are removed from the
If a word has not appeared for X days (word expiry), the number of times the
word has appeared (spam & clean) is decremented once per day until they reach
zero. When they both reach zero the word is removed from the database.
Minimum word occurence for filtering
Sets the minimum number of times a word has to appear before it is used in
filtering. A low setting will make the plugin more "trigger-happy", letting it
mark emails based on less data.
Incoming words are case-sensitive
If unselected, all new email will be converted to lower case before filtering
Create log file
Turns on/off logging
Learn (don't mark spam)
The plugin will do everything it normally does except it does not mark an email
as spam. This has the effect of letting the filter "learn" your email
without inital period that may make it mark a lot of email as spam before it
"knows" your email.
Don't forget to turn this option off when you think the filter has seen enough
of your email ;-)
Assume whitelisted email is clean
Selecting this means that the plugin will score any whitelisted email as zero,
i.e. perfectly clean.
Learn from whitelisted emails
Whether words found in whitelisted emails are added to the database (Use in
conjunction with the above).
Example: You may be subscribed to a mailing list about spam (containing words
that would be scored as spam) that you have whitelisted. If whitelisted email
is considered clean then the words in these emails would be added to the
database as clean. This option allows you to stop that happening.
Include headers in filtering
Select this if you want to include all the emails headers in the Bayesian
Add "X-Bayesian-Words" header
Option whether to add the "X-Bayesian-Words" header that lists the
interesting words that were found (and their scores). n.b The "X-Bayesian-
Result" header will always be added.
Learn from SpamPal and other plugins
If selected, the plugin will learn using results from SpamPal and all other
Maintenance of the list of words that the plugin will ignore.
Functionality of the buttons:
Add the word from the edit box into the list
Remove the selected words
Empty the list
Revert back to the state of the list before the configuration window was opened
Load the default ignore words
These functions act as if the files were received as email. If the file(s) that
are imported are not complete email messages the results are not guaranteed.
Import directory into database as spam/clean
Imports all files in a directory into the database
Choose the language you wish the plugin to use.
7. Recommended plugins
The default files are held in the plugin directory (e.g. C:\Program Files
be changed. There is a "user" copy in your SpamPal configuration
Your default SpamPal user configuration directory is...
Windows XP: C:\Documents and Settings\%USERNAME%\Application Data\SpamPal
Windows 2k: C:\Documents and Settings\%USERNAME%\Application Data\SpamPal
Windows NT: C:\WinNT\Profiles\%USERNAME%\Application Data\SpamPal\plugins
Windows 98: C:\Windows\Application Data\SpamPal\plugins\bayesian\
Windows 95: C:\Program Files\Spampal\config\plugins\bayesian\
n.b. This is also where the log files are saved.
The format of the wordlist file is shown below:
Spam = 947 // number of emails received classed as spam
Clean = 1744 // number of emails received classed as clean
adage = 1,0,0.99000001,1041011569 // word = spam_occurences,clean_occurences,spam_ratio,timestamp
advert = 1,0,0.99000001,1041011569
The format of the ignore file is shown below:
8. Other useful plugins
Good Words plugin