CGPSA - A Spam Filter for CommuniGate® Pro
Version 1.2.3
Copyright © 2002-2003 TFF Enterprises
Written by Daniel M. Zimmerman
All Rights Reserved
CGPSA works in concert with SpamAssassin™ software
to scan email messages distributed by a CommuniGate®
Pro server. The filter works efficiently, by directly using
the SpamAssassin API. It does not rely on a daemon process such as
spamd or on the execution of shell scripts (as the usual process for
utilizing SpamAssassin with CommuniGate servers does). It can safely
be used with multiple CommuniGate Pro enqueuer threads.
CGPSA supports all features of SpamAssassin, including such
functionality as the use of Razor, DCC, Bayesian
learning, and auto-whitelists. All these features are controlled through
SpamAssassin's regular configuration files. For more information on
SpamAssassin features and configuration, see the SpamAssassin documentation.
Downloads
The most current (release) version of CGPSA can always be downloaded
from
http://www.tffenterprises.com/cgpsa/cgpsa.tgz. The current (release)
version is
1.2.3,
released 14 November 2003. There is currently no beta release
available.
Documentation
This documentation consists of the following sections:
Overview
CGPSA has two basic operating modes, "full-featured" and "headers-only".
Full-Featured Mode
Full-featured mode uses CommuniGate Pro's PIPE functionality to
resubmit messages to the server after scanning, and can also use the
CommuniGate Pro CLI to determine information about message
recipients. In this mode, email is scanned only for the recipients on
the local system, and email for remote systems is always left
unaltered. This is good, because SpamAssassin headers added according
to your local policy may interfere with the spam filtering policy of a
remote system. If a particular message is destined for both local and
remote users, the local users receive a scanned copy and the remote
users an unaltered one. Scanned email is treated exactly as
SpamAssassin would treat it; for instance, if the SpamAssassin
preferences say to rewrite the subject line of spam, a scanned spam
will have a rewritten subject line.
Other advanced functionality, such as the use of individual
SpamAssassin preference files and state files for particular users
(e.g. auto-whitelist, Bayes database), is available in full-featured
mode. At this time, there is no user interface to configure individual
SpamAssassin preferences; however, system administrators with access
to the CommuniGate Pro directory may take advantage of these features
for their own use. Individual user preferences are located in a
.spamassassin
directory inside the web directory corresponding
to the account (username.macnt/account.web
).
Headers-only mode does not use CommuniGate Pro's CLI or PIPE
functionality. Instead, it adds headers to messages directly through
CommuniGate's external filter interface. This has the benefit of being
more efficient than using PIPE, because it does not require
resubmission and reprocessing of messages. However, it also eliminates
much of the advanced functionality, such as the use of individual
preference and state files, and the ability to distribute unaltered
messages to remote servers.
System Requirements
CGPSA requires the following in order to run:
- CommuniGate Pro version 4.0 or higher (4.1 or higher preferred)
- Perl version 5.6.0 or higher (5.6.1 or higher preferred)
- SpamAssassin version 2.5 or higher
- Additional Perl modules as required by the configuration options you
choose for CGPSA and SpamAssassin (described in the installation steps
below and in SpamAssassin's documentation).
Installation
Installation of CGPSA takes several steps. The following step-by-step
instructions should enable you to get the filter working on your system
(there are special instructions for Win32 systems throughout the steps).
If you can't get it working, ask for help (as described below in Bug Reports, Feature Requests and the Like).
- Install SpamAssassin on
your system. The way to do this is platform dependent - on FreeBSD, for
example, you might use the FreeBSD Ports system. Also install any
auxiliary programs you want SpamAssassin to use, such as Razor and DCC.
Perl is a prerequisite for
SpamAssassin. Instructions for installing SpamAssassin on Win32
operating systems are available here. On Win32,
you must add
RES_NAMESERVERS
to your system environment
variables to enable DNS lookups under Perl (this is discussed in the
installation instructions for SpamAssassin).
- Verify that the path in the first line of the
cgpsa
script points to the Perl executable on your system. If you have Perl
in a different location (such as /usr/local/bin/perl
or
/opt/bin/perl
), change the line accordingly. On Win32, this step is not required.
- Verify that the
$cgp_base
variable (listed under
"Customizable Variables" near the top of the cgpsa
script) contains the correct path to your CommuniGate Pro Base
Directory (hereafter referred to as "CommuniGate directory"). The
default path is /var/CommuniGate
on most systems. On Win32, specify the location with a drive letter, using forward slashes (not backslashes) as separators: for example, C:/CommuniGate Files/
(the default base directory on Win32).
- Install the
cgpsa
script inside your CommuniGate
directory, and make sure it is executable. You can create a new
subdirectory for it, or just put it at the top level; it doesn't really
matter where it is, but make sure you know the full path to it
(/var/CommuniGate/cgpsa
, if it is at the top level of a
default CommuniGate directory on UNIX) - you'll need the path
later.
- Install the
cgpsa.conf
configuration file into the
Settings
subdirectory of your CommuniGate directory. Modify
any of the settings you want to change - they are all well documented in
the configuration file itself.
- If you are running CGPSA in its full-featured mode with CLI usage
enabled, you must install the CommuniGate Pro CLI Perl
Module,
CLI.pm
. A version of CLI.pm
is
included with the CGPSA distribution. You can copy it to any of your
system's Perl @INC directories, or just place it in the directory where
you've installed cgpsa
. CLI.pm
requires the Digest::MD5
Perl module; if you do not have it installed on your system, download
and install it (or use the CPAN
module to install it). If you have installed other scripts on your
system that require the CommuniGate Pro CLI, you may already have
installed CLI.pm
; you do not need to reinstall it for
CGPSA, though you should ensure that it's a recent version. If you are
running in headers-only mode, or in full-featured mode without the CLI,
you need not install CLI.pm
for CGPSA to
function.
- If you are running CGPSA in its full-featured mode with CLI usage
enabled, you must create a CommuniGate Pro account with PWD access (as
described in the configuration file). Use the username and password
specified in your
cgpsa.conf
. This user must have
administrative access to all the domains for which you will be using
CGPSA (the easiest way to do this is to give it access to modify all
settings and privileges). Enable a number of PWD connections
greater than the number of enqueuer threads used by your server. See the
configuration file for details. If you are running in headers-only mode,
or in full-featured mode without the CLI, you need not create an account
for PWD access.
- If you are running CGPSA in its full-featured mode with CLI usage
enabled, and your CommuniGate Pro setup has IP addresses specifically
assigned to domains, make sure that either you have specified
"127.0.0.1" as an address associated with one of your domains or you
have changed the "cgp_hostname" setting in
cgpsa.conf
to
refer to a hostname or IP address on which the CommuniGate server is
listening.
- Create the "default home directory", whose path is set in the
configuration file (
cgpsa.conf
). This is where the default
SpamAssassin preferences and state files will be stored - that is, those
that are used in the absence of individual user
preferences.
- Create a SpamAssassin configuration in the default home directory,
if necessary. If the CommuniGate Pro server is the only software on your
system using SpamAssassin, you can just use SpamAssassin's
local.cf
file (usually in
/etc/mail/spamassassin/
) to set up SpamAssassin's
preferences. Otherwise, create a .spamassassin
subdirectory
inside the default home directory, containing a user_prefs
file with the SpamAssassin preferences to be used by
CGPSA.
- In the CommuniGate Pro web administration interface, go to the
"Helper Settings" page (under "Settings/General"). In an empty "Content
Filtering" section, enter a name for the filter (such as "CGPSA"). For
the Program Path on UNIX, enter the full path to the filter (example:
/var/CommuniGate/cgpsa
- see step 4). For the Program Path
on Win32, enter perl
followed by the full path to the
filter (example: perl C:/CommuniGate Files/cgpsa
. Set the
Log level to "Low Level" or "All Info" (for now). Leave "Time-out" and
"Auto-Restart" set to "Disabled" unless you experience problems with the
filter (be optimistic for the time being). Finally, enable the filter
and save your changes.
- Examine the CommuniGate Pro log for the current day, and search
for the string you entered as the filter's name in the previous
step. You should see a line that reads similar to the following: "*
TFF Enterprises CGPSA Filter (Version) Ready". If you do not see this line,
wait a few seconds and look again. If you still do not see it after a
minute or so, something went wrong and there will likely be an error
message in the log. If you can make sense of it and fix the problem,
wonderful; if not, ask for help (as described below in Bug Reports, Feature Requests and the
Like).
- Go to the "Server-Wide Rules" page of the CommuniGate Pro web
administration interface ("Settings/Rules") and create a rule for
CGPSA. The rule action should be "ExternalFilter CGPSA" (the name you
assigned to the filter two steps ago). No rule conditions are
necessary for proper operation, but greater efficiency will be
attained with the following conditions: "Any Route is LOCAL*" and
"Header Field is not X-TFF-CGPSA-Filter*" (or whatever you've changed
the loop prevention header to). Note that omitting the first of these
conditions means that mail distributed to remote servers will be
scanned (which may not be a good idea - in full-featured mode with the CLI, CGPSA will automatically leave mail for remote servers alone, but in other modes, it scans every message passed to it).
If all went well, CGPSA is running and you're done. Send a test
message to yourself and examine its headers to see whether it has been
scanned. You may want to change the log level after running CGPSA for
a while, because it does generate quite a lot of output (it's pretty
interesting output, but if you aren't writing/debugging the filter,
much of it isn't too useful).
Upgrading
To upgrade from a previous version of CGPSA, perform the following steps:
- Copy the new
cgpsa
and CLI.pm
over the
ones you currently have installed.
- Compare the configuration file (
cgpsa.conf
) that came
with the new CGPSA to the configuration file you currently have
installed. If there are any new configuration options that you want to
use, add them to your installed configuration file. Alternatively, you
can add your own customizations to the new configuration file and copy
it over the old one.
- In the CommuniGate Pro web administration interface, go to the
"Helper Settings" page (under "Settings/General"). In the "Content
Filtering" section you made for CGPSA, uncheck the checkbox by "Use
Filter", and then click "Update". Then, re-check the checkbox by "Use
Filter" and click "Update" again.
- Examine the CommuniGate Pro log for the current day. You should see
a line that reads "* TFF Enterprises CGPSA Filter (Old Version Number)
Done", and slightly later a line that reads "* TFF Enterprises CGPSA
Filter (New Version Number) Ready". If you don't see this, and it
doesn't show up after a reasonable amount of time, restart your
CommuniGate server (using the startup/shutdown script installed with
CommuniGate). If CGPSA still doesn't start, ask for help (as described
below in Bug Reports, Feature Requests and the
Like).
Disclaimer
It is possible that CGPSA contains bugs, although it has already been
in use for months in production systems and we consider it relatively
stable. Any bugs that might exist in CGPSA are unlikely to cause the
loss of email, because CommuniGate Pro is very intelligent about how it
works with external filters - when a filter fails, the message stays
enqueued. However, it is possible that email could be lost.
There is no warranty, express or implied, associated with CGPSA; we will
not be liable for any lost email. Use at your own risk.
License and Fees
The final licensing model for CGPSA has not been determined. It has
very similar functionality to some existing commercial products, and
significant effort has been put into it; it would thus be good to
receive something in return. However, charging licensing fees even
remotely similar to those charged for the aforementioned products
makes no sense. Currently, I'm leaning toward some sort of
shareware/donationware model. If you have any suggestions in this
regard (i.e., how much is the filter worth to you?), let me know.
In the meantime, CGPSA is not to be redistributed without explicit
permission. Its source code may not be used in any other products,
in verbatim or modified form, whether or not the product is
open-source. You may of course modify the source code in any way you
like for use on your own system.
Bug Reports, Feature Requests, and the Like
Suggestions for feature improvements and bug fixes will gladly be
accepted. There is a mailing list for discussion about this filter,
cgpsa-discuss@tffenterprises.com
; this mailing list is
the primary channel for support, discussion of feature requests and
bug fixes, etc. It's a standard CGP mailing list: you can join it by
emailing cgpsa-discuss-on@tffenterprises.com. I
expect it to be pretty low traffic; if you don't want to join the
list, though, send any questions, comments, rants, etc. to cgpsa@tffenterprises.com.
When sending a bug report, be sure to include any relevant information
from the CommuniGate Pro log and the cgpsa.err
log (located
in the same directory as your cgpsa.conf
file).
Revision History
- 1.2.3 - 14 November 2003
-
Bug Fixes
- Removed an unintentional dependency on Time::HiRes that was introduced
in version 1.2.2 (and caused CGPSA to crash).
- 1.2.2 - 14 November 2003
-
Changes
- Added configuration options for specifying the install prefix,
system rules directory, and custom rules directory for SpamAssassin.
This should allow CGPSA to work properly, without hackery such as
duplicating entire directory trees, on systems such as Win32 where
current versions of SpamAssassin do not seem to correctly compile paths
into SpamAssassin.pm.
- Added code to use Time::HiRes, if present, to give more accurate timing
for SpamAssassin processing.
- Perl 5.6.1 is now required to use CGPSA. This prevents various bugs
in Perl 5.6.0 from affecting CGPSA's text processing.
Bug Fixes
- Fixed an issue where pathnames with spaces could not be used in the
cgpsa.conf file (this was especially problematic for Windows
users).
- 1.2.1 - 2 October 2003
-
Bug Fixes
- Fixed an issue where we inserted a duplicate X-Spam-Checker-Version header under SpamAssassin 2.6 in headers-only mode.
- Fixed an issue where we sometimes, while trying to get rid of SpamAssassin headers from other servers, ended up duplicating our own SpamAssassin headers.
- 1.2 - 17 September 2003
-
Changes
- Perl 5.6.0 or higher is now required to run CGPSA (this was
unintentionally the case before, but CGPSA will die with a much more
intuitive message now)
- SpamAssassin 2.5 or higher is now required to run CGPSA.
- The "max_requests" configuration option, which causes CGPSA to kill
itself after processing a certain number of requests (to help combat memory
leaks on systems where "parallel_requests" can't be used), has been
added.
- Spurious X-Spam-* headers (that is, those added by SpamAssassin
running on other servers) are now removed from messages when running in
full-featured mode.
- Headers-only mode has been made compatible with SpamAssassin 2.60 series
releases.
- "\e" has replaced "\n" as the line separator in headers constructed by
CGPSA.
Known Issues
- 1.1 - 27 July 2003
-
Changes
- Parallel requests mode is now turned on by default on non-Windows
platforms.
- Modified the processing of message headers to preserve more
information about the Envelope-To addresses.
- Added the ability to specify a list of destination domains that should
always have their mail scanned, with the "scan_domains" setting. This makes
it possible to use CGP as a relay/gateway server (to filter and process
email for other domains).
- Added date/time stamp to the standard error output generated by
CGPSA. Output generated by other Perl code (such as SpamAssassin) is not
date/time stamped.
- Changed CGPSA's output functions to wrap lines to a reasonable number of
characters (around 150), to work around the CommuniGate log line length
limit.
- Added signal handling: if the main CGPSA process receives a HUP, it
reloads all of its own and SpamAssassin's preferences before processing
the next CLI command it receives.
- Partially worked around the behavior whereby, when parallel_requests is
off and use_user_prefs is on, users' SpamAssassin preferences get "stacked"
on top of each other. Now, the SpamAssassin preferences in the default home
directory are loaded before each set of user preferences, so they are
"stacked" on top of the user prefs; this means that, if the default
preferences file contains explicit settings for every parameter changed in a
user preference file, SpamAssassin will return to these defaults between
users.
- Worked around SpamAssassin's problem with extremely large messages, by
only scanning the first 250K of each message.
- Changed the location of the per-user configurations used by CGPSA (old
configurations will automatically be moved to the new location).
Known Issues
- If CGPSA has to truncate a message in order to scan it, that message
will almost definitely trigger the SpamAssassin rule
"MIME_MISSING_BOUNDARY". This probably won't cause non-spam messages to
cross the spam threshold, but it is a possibility.
- When CGPSA catches a HUP signal and reloads its preferences, it leaks
some memory (as a result of the old SpamAssassin, or parts thereof, not
going away). This cannot be fixed with the current version of SpamAssassin;
work is underway on a patch to SpamAssassin to enable a fix.
- Entries in the CGP log that are line-wrapped have "\t" as their leading
character when CGPSA is used with Perl earlier than 5.8. This is a result of
limitations in the older Text::Wrap code.
- 1.0.8 - 18 June 2003
-
Bug Fixes
- Fixed an issue with the generation of random filenames that could have,
on certain systems, resulted in message loss when using parallel
requests.
- 1.0.7 - 17 June 2003
-
Changes
- Added the ability to use per-user preferences in either the old
location (the user's account directory) or the new location (the user's
"account.web" directory). This is primarily so that users of the 1.1 betas
can have a stable version to revert to without manually moving preferences
back and forth.
Bug Fixes
- Fixed a bug where mail for "all@" and "alldomains@" addresses could have
been delivered when it shouldn't have been. Mail to "all@" and "alldomains@"
addresses is now scanned in ADDHEADER mode, to preserve CGP's security
model for such mail.
- 1.0.6 - 5 May 2003
-
Changes
- Added an option (turned on by default) to redirect SpamAssassin's
error output to a file rather than to standard error. This solves a
problem that would cause the filter to hang on certain operating
systems (including Mac OS X).
- Added information to log output about the paths to the SpamAssassin
settings files being used; the path to the default settings file is output
at filter startup time, and the paths to user settings files are output
when those files are used.
Bug Fixes
- Fixed a race condition where CGPSA would check for the ability
to connect to the CLI before the CommuniGate server was ready to accept
connections to the PWD port.
- Fixed the entry for "default_home_dir" in the configuration file to
accurately reflect its default value.
- 1.0.5 - 14 April 2003
-
Changes
- Added some extra debugging output in various places, including
printing of the preferences file location and the SpamAssassin version
at startup.
- Added support for the new "Route Mail" CLI command option (for
CommuniGate 4.1b3 and higher; compatibility is auto-detected).
Bug Fixes
- Fixed a problem where mail to the "all@" and "alldomains"
addresses would be bounced because of routing errors.
- Fixed a problem where CGPSA would use root's home directory to
store SpamAssassin state files in certain configurations.
- Fixed a problem where the "cgp_hostname" setting was not being
read from the configuration file.
- 1.0.4 - 2 April 2003
-
Changes
- Added the "use_c_locale" configuration option, to force the C locale to be used by Perl when running SpamAssassin (which has some known problems with certain locales). The default is to use the C locale.
- Added more logic to the headers-only mode, so that we now produce the same headers as SpamAssassin would if it were rewriting the message itself.
- Changed the text of the loop prevention header to read either "Scanned" or "Scan Failed", as appropriate (rather than "Attempted").
- Fixed a problem that occurred with SpamAssassin's "report_safe" option turned on, where an email identified as spam would be processed in an infinite loop.
- Added the "direct_mailbox_rewrite" and "direct_mailbox_scan" configuration options, and removed the "direct_mailbox_passthrough" configuration option. This allow for more flexibility when determining the scanning policy for direct mailbox addresses. The default settings are to not rewrite direct mailbox addresses, and to not scan mail for direct mailbox addresses.
- 1.0.3 - 30 March 2003
-
Changes
- Added the "parallel_requests" setting, which defaults to "false". This addresses problems on Win32, and potentially on other platforms as well, with the mechanism currently used to process multiple emails in parallel.
- Changed the default settings for "use_user_prefs", "require_user_prefs", and "use_user_state" to "no".
- Removed OS name and Perl version from X-TFF-CGPSA-Version header.
- Added clarification in the configuration file that the default home directory setting should not be quoted.
Bug Fixes
- Fixed a problem where the "headers_only" setting was not read properly.
- Fixed a problem where, if there was no default SpamAssassin preferences file and user preferences were not being used, no mail would be scanned.
- 1.0.2 - 27 March 2003
-
Bug Fixes
- Fixed a problem where the CGP CLI module wouldn't load properly if it wasn't found as "CGP/CLI.pm".
- Fixed a problem where, under certain circumstances, Perl's Taint Mode (security infrastructure) prevented CGPSA from removing some temporary files used in SpamAssassin initialization.
- Fixed a non sequitur in the documentation.
- 1.0.1 - 27 March 2003
-
Changes
- Added the "install CGP::CLI" step to the installation instructions.
- Added runtime check for CGP::CLI module, so that it need not be
installed if CGPSA is running with "use_cli" set to false (or in headers-only mode). The module can be installed as
CGP/CLI.pm
or just CLI.pm
.
- 1.0 - 23 March 2003
-
Future Plans
I'd like a better name to replace "CGPSA". If
anybody has any suggestions, I'd really appreciate hearing them...
A hypothetical future version of CGPSA will have a user interface
to support individual auto-whitelists, Bayes databases, and
SpamAssassin preferences for CommuniGate users. This hypothetical
version of CGPSA will also be refactored to be somewhat more
object-oriented and modular (because 1000-line Perl scripts are not
the easiest things in the world to maintain).
Legal Stuff
CommuniGate is a registered trademark of Stalker Software, Inc. SpamAssassin is a registered trademark of Deersoft, Inc.
Last modified by Daniel M. Zimmerman on
14 November 2003