PureMessage uses a variety of methods to detect spam.
These methods are embedded in the test definition of anti-spam rules. PureMessage detection methods are included in one of the
following feature groups.
- Spam Signatures Analysis: Signatures are created using spam data compiled by
SophosLabs. There are signatures for each of the various email message parts, including the
message body, paragraphs within the message body, HTML, images, and attachments. These are all
tested against the contents of the messages that PureMessage processes. Signatures can be used to detect spam
characteristics, even during spam campaigns in which some aspects of the messages are still
evolving.
- Known Spam Destinations: PureMessage includes
a database of URLs associated with spam messages. This database is distributed with the PureMessage Anti-Spam heuristic update. URI tests determine
whether messages contain URLs that are included in this database.
- Adaptive Message Classification: Tests can be generated via "training". Training
applies an adaptive classification algorithm to a message set that consists of a cross-section
of spam and non-spam messages drawn from the archives. These anti-spam rules should not be
manually altered; to change the rule set, re-run the training.
Note: In most cases, adaptive
training classification does not significantly increase spam-filtering accuracy. Not only does
it require additional effort to set up and maintain, but it is also generally less effective
for organizations where there is a wide variation between the definitions of spam and non-spam
messages. It is usually best to rely on the rules and data updates provided by SophosLabs and
to submit any missed spam or false positives to Sophos for analysis. Adaptive training
classification is intended for use by organizations with spam or not-spam patterns that differ
significantly from the norm.
- Sender Reputation: By default, PureMessage
performs two types of DNS checks: reverse DNS look-ups and queries via Sophos's own SXL
infrastructure. Because IP classification is handled as part of the SXL queries, the third-party
DNS black lists are disabled by default. Before enabling any of the disabled DNSBL rules, be
sure that you have the associated DNSBL licenses, if applicable.
- Heuristic Analysis: Specified message components, such as the subject line or the
message body, are analyzed by a regular expression. For example, a regular expression test can
check for the occurrence of a specific word or phrase in the body of an email message. On the
Anti-Spam Rules page in the PureMessage Manager, the
test definition component of a regular expression test is prefixed by the word
Test.
- Site Features: PureMessage uses internal
programmatic functions to test for various message characteristics. For example, multiple
similar recipient addresses (johna@domain.com,
johnb@domain.com, johnc@domain.com) often indicate a spam
message. On the Anti-Spam Rules page in the PureMessage
Manager, the test definition component of a message evaluation test is prefixed by the word
Eval.
PureMessage also uses "meta" tests to check the result
of two or more tests. For example, a meta test might be configured to be true if two other rules
are also true. You cannot create custom meta tests; however, you can alter the score of existing
tests. On the Anti-Spam Rules page in the PureMessage
Manager, the test definition component of a meta test is prefixed by the word
Meta.
Regular expression tests, message evaluation tests, meta tests and URI tests are enabled by
default, as are DNS checks. tests generated from adaptive classification training must be
manually configured and enabled.