NAME

pmx-spam - Interface to the PureMessage anti-spam component


SYNOPSIS

 pmx-spam <command> [options]


DESCRIPTION

The pmx-spam program is used to view and set weights for anti-spam rules. It is also used to generate adaptive classification rules via 'training'. pmx-spam can be used to scan individual messages and display the matching rules.

Rule tests are defined in the *.rules files located in the /opt/pmx/etc/data/antispam directory. Customer-specific overrides or new rules can be found in the /op/pmx/etc/spam.d directory.

The anti-spam engine in PureMessage uses 'features' and 'weights', two separate attributes of message scanning.

  1. Features

    A ``feature'' is a characteristic of a message. If a message has a feature, it is similar in some way to other messages that have the same feature. Similarly, a message without a particular feature is different from a message with that feature.

    The presence of a feature in a message is determined by testing components of the message against regular expressions defined in /opt/pmx/etc/data/antispam/re.rules. Custom features can be defined in the file /opt/pmx/etc/spam.d/re.rules.

  2. Weights

    Each feature has an associated ``weight''. A feature's weight expresses the likelihood that a message containing that feature is spam. The higher the weight, the more likely the presence of the feature is indicative of spam. Conversely, negative weights express the likelihood that a message is not spam; the lower the negative number, the more likely that the message is not spam.

    Weights can be expressed as either numerical values or 'probability deltas' (percentages). Weights for custom rules are configured using the pmx-spam weights --set or the pmx-spam pdeltas --set command, as described below. Weights for custom rules are deleted using the pmx-spam weights --del or the pmx-spam pdeltas --del command. (In previous versions of PureMessage, weights were referred to as ``scores'', and were defined in the same files as rules: spam-tests.conf and site-spam-tests.conf.)

    If both a weight and a probability delta are assigned to a feature, both values are used when messages are scanned for spam. The weight is first converted to a probability, then the value of the probability delta is added to determine the total score.


COMMANDS

The following sub-commands are recognized by pmx-spam:

weights
The weights command is used to view or alter the weight associated with a feature. To view all features and their assigned weights, enter:
   pmx-spam weights

The weights command recognizes the following options:

   pmx-spam weights --sort name
   pmx-spam weights --sort value
   pmx-spam weights --sort name --ascending
   pmx-spam weights --sort name --descending
   pmx-spam weights --trim 0.5
   pmx-spam weights --set NAME VALUE
   pmx-spam weights --del NAME

Adding option --dynamic will print the weights for dynamic features instead of the static features. Dynamic features are computed with the 'train' command.

For example:

   pmx-spam weights --sort name --descending --dynamic

By default, the display is sorted by value, in descending order. To sort the display alphabetically by feature name, or to sort by ascending value, use the --sort option:

   pmx-spam weights --sort name
   pmx-spam weights --sort value

To reverse the order, use the --ascending and --descending flags. Sorting by name defaults to ascending; sorting by value defaults to descending.

To view weights greater than or equal to a specified threshold, use the --trim option. Specify a numeric threshold, up to three decimal places. If no threshold is specified, the default is 0.001.

Note: The --trim option returns the absolute values of all matching results. Therefore, some results may be in the form of negative numbers.

To modify a weight, specify its name and a new value:

   pmx-spam weights --set <name> <value>

For example, to set the ACCOUNT_CLICK feature's weight to 0.5, enter:

   pmx-spam weights --set ACCOUNT_CLICK 0.5

To delete a weight, use the --del switch.

   pmx-spam weights --del <name>

For example:

   pmx-spam weights --del MY_RULE

The --set and --del options cannot be used simultaneously.

pdeltas
The pdeltas command is used to view or alter the probability delta (percentage) associated with a feature. To view all features and their assigned probability deltas, enter:
   pmx-spam pdeltas

The pdeltas command has the same arguments as the weights command:

   pmx-spam pdeltas --sort name
   pmx-spam pdeltas --sort value
   pmx-spam pdeltas --sort name --ascending
   pmx-spam pdeltas --sort name --descending
   pmx-spam pdeltas --trim 0.5
   pmx-spam pdeltas --set NAME VALUE
   pmx-spam pdeltas --del NAME

By default, the display is sorted by value, in descending order. To sort the display alphabetically by feature name, or to sort by ascending value, use the --sort option:

   pmx-spam pdeltas --sort name
   pmx-spam pdeltas --sort value

To reverse the order, use the --ascending and --descending flags. Sorting by name defaults to ascending; sorting by value defaults to descending.

To modify a pdelta, specify its name and a new value:

   pmx-spam pdeltas --set <name> <value>

For example, to set the ACCOUNT_CLICK feature's pdelta to 5%, use this command:

   pmx-spam pdeltas --set ACCOUNT_CLICK 0.05

To delete a feature's pdelta value, use the --del switch.

   pmx-spam pdeltas --del <name>

For example:

  pmx-spam pdeltas --del MY_RULE

The --set and --del options cannot be used simultaneously.

groups
The groups command is used to view the Feature Groups defined by the anti-spam heuristics currently installed. A group can be enabled or disabled by modifying the appropriate file in /opt/pmx/etc/spam.d/groups.d/.

scan
The scan command is used to test the anti-spam rule set against one or more messages. The command will display a list of the spam features identified in the message, and will calculate the message's total spam score.

To scan a single message, enter the following command:

   pmx-spam scan /tmp/file

If the first line of the file matches the regular expression /^From /, the file is parsed as a UNIX mbox file; otherwise, the entire file is considered to be a single message.

To scan every file in a directory, simply specify the directory:

   pmx-spam scan /tmp/maildir

Each file in the directory is parsed as if it were specified individually.

is-spam
not-spam
The is-spam and not-spam commands are used to add messages to the training database. When the pmx-spam train command is subsequently run, messages added using the is-spam and not-spam commands are used to generate a set of adaptive classification rules.

To add messages to the training database, use the following commands:

   pmx-spam is-spam /tmp/missed-spam.mbox
   pmx-spam not-spam /tmp/important-mail.mbox

If the first line of the file matches the regular expression /^From /, the file is parsed as a UNIX mbox file; otherwise the entire file is considered to be a single message.

To scan every file in a directory, simply specify the directory:

   pmx-spam is-spam /tmp/missed-spam-dir

Each file in the directory is treated as if it were specified individually.

For more information about generating a training database, see 'Creating Anti-Spam Rules via Adaptive Classification Training' in the Policy Configuration section of the Administrator's Reference.

train
The train command is used to generate a set of anti-spam rules based on adaptive classification techniques. The train command analyzes the messages in the training database (added with the is-spam and not-spam commands), and generates a set rules that express the characteristics of the training messages. An optimum training database consists of at least 500 messages of each type registered with the is-spam and not-spam commands.

To generate a set of anti-spam rules, use the command:

   pmx-spam train

Anti-spam rules generated using the train command are used in addition to the standard anti-spam rule set included with PureMessage. Training does not replace the standard rule set; rather, it augments it with rules derived from site-specific examples of misclassification.

The train command has two options:

   pmx-spam train --force

Use the --force option to proceed with training, even though your message set does not meet the recommended 500-message minimum requirement.

  pmx-spam train --hints

Use the --hints option to calculate requirements for an optimal message set. This option will count the number of messages in the 'is-spam' and 'not-spam' database, and calculate the number of messages of each type that is required to achieve a message set that is both sufficiently large and appropriately balanced for training.

After training, the adaptive classification rule set is automatically enabled. To disable the adaptive classification rules, edit the bayes.conf configuration file and set the value of the 'enabled' option to 'no'.

For more information, see 'Creating Anti-Spam Rules via Adaptive Classification Training' in the Policy Configuration section of the Administrator's Reference.


COPYRIGHT

Copyright (C) 2000-2008 Sophos Group. All rights reserved. Sophos and PureMessage are trademarks of Sophos Plc and Sophos Group.

Regular expression support is provided by a modified version of the PCRE library package (see http://www.pcre.org), which is open source software, written by Philip Hazel. Copyright (c) 1997-2003 University of Cambridge.