pmx-spam - Interface to the PureMessage anti-spam component
pmx-spam <command> [options]
The pmx-spam program is used to view and set weights for anti-spam rules. It
is also used to generate adaptive classification rules via 'training'. pmx-spam
can be used to scan individual messages and display the matching rules.
Rule tests are defined in the *.rules
files located in the /opt/pmx/etc/data/antispam
directory. Customer-specific overrides or new rules can be found in the /op/pmx/etc/spam.d
directory.
The anti-spam engine in PureMessage uses 'features' and 'weights', two separate attributes of message scanning.
A ``feature'' is a characteristic of a message. If a message has a feature, it is similar in some way to other messages that have the same feature. Similarly, a message without a particular feature is different from a message with that feature.
The presence of a feature in a message is determined by testing components of the message against regular expressions defined in /opt/pmx/etc/data/antispam/re.rules. Custom features can be defined in the file /opt/pmx/etc/spam.d/re.rules.
WeightsEach feature has an associated ``weight''. A feature's weight expresses the likelihood that a message containing that feature is spam. The higher the weight, the more likely the presence of the feature is indicative of spam. Conversely, negative weights express the likelihood that a message is not spam; the lower the negative number, the more likely that the message is not spam.
Weights can be expressed as either numerical values or
'probability deltas' (percentages). Weights for custom rules are configured
using the pmx-spam weights --set
or the pmx-spam pdeltas --set
command, as described below. Weights for custom rules are deleted using
the pmx-spam weights --del
or the pmx-spam pdeltas --del
command.
(In previous versions of PureMessage, weights were referred to as ``scores'',
and were defined in the same files as rules: spam-tests.conf and
site-spam-tests.conf.)
If both a weight and a probability delta are assigned to a feature, both values are used when messages are scanned for spam. The weight is first converted to a probability, then the value of the probability delta is added to determine the total score.
The following sub-commands are recognized by pmx-spam:
pmx-spam weights
The weights
command recognizes the following options:
pmx-spam weights --sort name pmx-spam weights --sort value pmx-spam weights --sort name --ascending pmx-spam weights --sort name --descending pmx-spam weights --trim 0.5 pmx-spam weights --set NAME VALUE pmx-spam weights --del NAME
Adding option --dynamic
will print the weights for dynamic features instead of
the static features. Dynamic features are computed with the 'train' command.
For example:
pmx-spam weights --sort name --descending --dynamic
By default, the display is sorted by value, in descending order. To
sort the display alphabetically by feature name, or to sort by ascending value,
use the --sort
option:
pmx-spam weights --sort name pmx-spam weights --sort value
To reverse the order, use the --ascending
and --descending
flags.
Sorting by name defaults to ascending; sorting by value defaults to
descending.
To view weights greater than or equal to a specified threshold, use the --trim
option. Specify a numeric threshold, up to three decimal places. If no
threshold is specified, the default is 0.001
.
Note: The --trim
option returns the absolute values of all matching results.
Therefore, some results may be in the form of negative numbers.
To modify a weight, specify its name and a new value:
pmx-spam weights --set <name> <value>
For example, to set the ACCOUNT_CLICK feature's weight to 0.5, enter:
pmx-spam weights --set ACCOUNT_CLICK 0.5
To delete a weight, use the --del
switch.
pmx-spam weights --del <name>
For example:
pmx-spam weights --del MY_RULE
The --set and --del options cannot be used simultaneously.
pmx-spam pdeltas
The pdeltas
command has the same arguments as the weights
command:
pmx-spam pdeltas --sort name pmx-spam pdeltas --sort value pmx-spam pdeltas --sort name --ascending pmx-spam pdeltas --sort name --descending pmx-spam pdeltas --trim 0.5 pmx-spam pdeltas --set NAME VALUE pmx-spam pdeltas --del NAME
By default, the display is sorted by value, in descending order. To
sort the display alphabetically by feature name, or to sort by ascending value,
use the --sort
option:
pmx-spam pdeltas --sort name pmx-spam pdeltas --sort value
To reverse the order, use the --ascending
and --descending
flags.
Sorting by name defaults to ascending; sorting by value defaults to
descending.
To modify a pdelta, specify its name and a new value:
pmx-spam pdeltas --set <name> <value>
For example, to set the ACCOUNT_CLICK feature's pdelta to 5%, use this command:
pmx-spam pdeltas --set ACCOUNT_CLICK 0.05
To delete a feature's pdelta value, use the --del
switch.
pmx-spam pdeltas --del <name>
For example:
pmx-spam pdeltas --del MY_RULE
The --set
and --del
options cannot be used simultaneously.
scan
command is used to test the anti-spam rule set against one or more
messages. The command will display a list of the spam features identified in
the message, and will calculate the message's total spam score.
To scan a single message, enter the following command:
pmx-spam scan /tmp/file
If the first line of the file matches the regular expression /^From /
,
the file is parsed as a UNIX mbox file; otherwise, the entire file is
considered to be a single message.
To scan every file in a directory, simply specify the directory:
pmx-spam scan /tmp/maildir
Each file in the directory is parsed as if it were specified individually.
is-spam
and not-spam
commands are used to add messages to the
training database. When the pmx-spam train
command is subsequently run,
messages added using the is-spam
and not-spam
commands are used to
generate a set of adaptive classification rules.
To add messages to the training database, use the following commands:
pmx-spam is-spam /tmp/missed-spam.mbox pmx-spam not-spam /tmp/important-mail.mbox
If the first line of the file matches the regular expression /^From /
,
the file is parsed as a UNIX mbox file; otherwise the entire file is
considered to be a single message.
To scan every file in a directory, simply specify the directory:
pmx-spam is-spam /tmp/missed-spam-dir
Each file in the directory is treated as if it were specified individually.
For more information about generating a training database, see 'Creating Anti-Spam Rules via Adaptive Classification Training' in the Policy Configuration section of the Administrator's Reference.
train
command is used to generate a set of anti-spam rules
based on adaptive classification techniques. The train
command analyzes
the messages in the training database (added with the is-spam
and
not-spam
commands), and generates a set rules that express the
characteristics of the training messages. An optimum training database
consists of at least 500 messages of each type registered with the
is-spam
and not-spam
commands.
To generate a set of anti-spam rules, use the command:
pmx-spam train
Anti-spam rules generated using the train
command are used in addition
to the standard anti-spam rule set included with PureMessage. Training does
not replace the standard rule set; rather, it augments it with rules
derived from site-specific examples of misclassification.
The train
command has two options:
pmx-spam train --force
Use the --force
option to proceed with training, even though your message set
does not meet the recommended 500-message minimum requirement.
pmx-spam train --hints
Use the --hints
option to calculate requirements for an optimal message set.
This option will count the number of messages in the 'is-spam' and
'not-spam' database, and calculate the number of messages of each type
that is required to achieve a message set that is both sufficiently large
and appropriately balanced for training.
After training, the adaptive classification rule set is automatically enabled. To disable the adaptive classification rules, edit the bayes.conf configuration file and set the value of the 'enabled' option to 'no'.
For more information, see 'Creating Anti-Spam Rules via Adaptive Classification Training' in the Policy Configuration section of the Administrator's Reference.
Copyright (C) 2000-2009 Sophos Group. All rights reserved. Sophos and PureMessage are trademarks of Sophos Plc and Sophos Group.
Regular expression support is provided by a modified version of the PCRE library package (see http://www.pcre.org), which is open source software, written by Philip Hazel. Copyright (c) 1997-2003 University of Cambridge.