Numerous articles have focused on ways to recognize and block spam email. As system administrators work to build sophisticated roadblocks, spammers continue to
find ways to knock them down. This article will focus on one viable solution,
SpamShield version 1.40 -- a Perl-based spam filter for sendmail. We'll cover how it works and how to install and configure it on your server.
The science of spam (if you can call it that) has taught us one thing: spam leaves a definite "calling card" in the system logs. This calling card is generally repetitive enough that the process of tracking spam may be automated. Based on this theory, a brilliant programmer by the name of Kai Schlichting wrote a Perl-based program called SpamShield.
United States federal and state laws now prohibit the transmission of spam, but these are only laws, and where laws exists, so do criminals intent on breaking them. We must also remember that U.S. laws only apply in the U.S., not in other countries. The Internet is a global playing field, and laws concerning its use are created and enforced on a country-by-country basis. For example, in China, India, and Romania, spamming is legal.
Spam email usually originates from profit-based organizations that purchase "spam lists" from "list sellers" located in various corners of the Internet. Legitimate and illegitimate companies make use of them. I've received email offers promising everything from credit repair to instant university diplomas (I don't require credit repair and I already have my diploma -- thanks). Some of the more legitimate companies include credit card firms, car manufacturers, and drug companies.
All too often, you may see a message at the bottom of the spam that says something like this:
"If you would rather not receive these messages, please click here. It will take up to 48 hours for your request to take effect. All third party products and services promoted on this Site are offered exclusively by third party advertisers. XXX Company makes no representations or warranties with respect to these offers and all claims for injury and damages related to such offers are the sole responsibility of the advertiser."
Talk about integrity.
There are endless ways to seek out email addresses on the Internet. In most cases, list suppliers generate their lists by using computers that scan Web sites and databases filtering the "@" symbol. Next, each address is verified using SMTP verification software, then compiled into a portable database.
|
Related Reading
|
In other cases, requesting that you be removed from a list (as in the example above) verifies that your address exists; hence, you're added to more lists.
The most common method used to send spam is through an "open" relay host. An open relay is simply an SMTP server that allows any domain to connect on port 25, and relay through to another domain. The engineers at sendmail.org have worked for several years to find ways to reject relaying, using filtering methods such as the access database.
Newer versions of sendmail do a reverse domain lookup before allowing mail to
pass. If the incoming domain doesn't exist, sendmail will typically reject the
message. This prevents spam from sources that use nonexistent domains in their
return header.
The basic principle of SpamShield is fairly straightforward. First it gathers a "chunk" of log information and builds that into a volume of its own. Next, based on a predetermined threshold value, the software decides if the volume contains more than the allowable amount of email originating from any single source (such as "spamdomain.com"). Once the allowable value has exceeded the threshold, SpamShield simply blocks that source from further access.
Through experience, I learned that setting the threshold value is the most important part of making SpamShield run efficiently.
Here is a definition of how SpamShield works, appended from Kai's readme document:
"SpamShield looks at the last <n> lines of the sendmail logfile (maillog), and builds a list of mail volume received from various hosts (by IP) in the period covered by that log fragment. If any particular machine sends more mail than the configured global threshold, the assumption is that spam is received. The IP address is then dropped to a "dead host" (an unused IP address within your netblock). The defaults for the log file fragment and the allowable number of mails per host are for a small system with only a few thousand mails per day. You might want to adjust those limits to avoid false positives. (see set $spamthreshold ). The general assumption is that spam abuse typically means that up to several hundred emails PER MINUTE are received from a single source: this is a tremendous 'signal to noise' ratio, given that even very large systems, such as AOL's mail servers, don't deliver more than a few hundred mails to a small/medium-sized system per day. For this reason, there are configuration options to ignore 'spam-like' traffic from high-traffic hosts that are deemed secure and non-relaying (AOL's servers don't relay, for example)."
Installation of Kai's software is simple. SpamShield is a Perl script, so you'll need Perl 5, available from Perl.com.
First, download the tarball from Kai's site to your src directory, then untar it. The uncompressed directory structure will look like this:
spamcontrol/
spamcontrol/blocked
spamcontrol/INSTALL-spamshield
spamcontrol/spamshield.pl
spamcontrol/dontblock
spamcontrol/blockignore
Next, move the spamcontrol directory somewhere more convenient, such as /usr/local/spamcontrol:
Command: mv ./spamcontrol /usr/local/spamcontrol
The Perl script, spamshield.pl should be mode 700, and owned by root:wheel:
Command: chmod 700 ./spamshield.pl ; chown root:wheel spamshield.pl
Please review ./INSTALL-spamshield, located in the root directory, for a detailed installation overview.
|
Note this configuration example is BSD-dependent, in that we use /var/log/maillog for
all MAILER-DAEMON messages. Other Unix variants use /var/log/messages. This
option is configurable within the syslogd.conf file on most systems. For more help
with syslog, see Michael Lucas's ONLamp article on syslog configuration.
Now edit the Perl script, spamshield.pl, using your favorite editor. I suggest
you use a "long line" editor, such as vi. Follow these steps:
Point the Perl path to the proper location of your perl5 interpreter in the first line. In most cases, this should be /usr/bin/perl. On other systems, this may be /bin/perl. The line should look something like this: #!/usr/bin/perl. (For those of you who have not done any Perl scripting, do not remove the hash mark (#) before the (!). It belongs there, and is not an uncommented line.
To find the path where Perl resides, enter:
Command: which perl$log to the location of your sendmail logfile: /var/log/maillog in most cases (or /var/log/messages).$lastlines to the number of most recent log file lines you
want the program to look at. The default is 1500, representing 4-8
hours of mail on a small system. $spamthreshold to the number of emails that may be
received from any single source IP within the number of lines
configured above, before considering the source to be a
spamming host.$dontblock to a file that has a plain one-by-line list of IP
hosts that are never to be blocked. This includes, for example,
your own IP number and that of the loopback interface
(127.0.0.1). Warnings about spam from the hosts listed will
still be mailed out! $blockactive file to see what blocks are currently
active. You should manually edit this file after a spam has
been dealt with, so that the program ignores future connections
from this host. $blockignore to a file that has a plain one-by-line list of IP
hosts that Kai's SpamShield will never complain about, or take
any action about. This is usually all of your own mailhosts, if
they relay mail to each other, and are usually hosts that run
SpamShield themselves. This avoids a spammer that creates a
spam storm, where mail servers start ignoring each other. $securetmp to a directory (the default is
/usr/local/spamcontrol/) where temporary files can be created
safely; e.g., the directory is owned by the owner of this
program, and no one else has permission to write to it.$blackhole to an unused IP number on your local subnet, or
you will get errors that route is not reachable. This is the
route all traffic to undesired hosts is redirected to. Take care
not to use this IP number for anything else. Leave undefined
(comment out) if you do not wish to use IP blocking. $maintainer to contain a comma-separated list of email
addresses that are to be notified of any spam activity. Note
that @ must be escaped as \@ in perl. Leave undefined
(comment out) to not send any mail to anyone.$SENDMAIL="/usr/sbin/sendmail";
$TAIL="/usr/bin/tail";
$AWK="/usr/bin/awk";
$GREP="/usr/bin/grep";
$SORT="/usr/bin/sort";
$CAT="/bin/cat";
$DATE="/bin/date";
$ROUTE="/sbin/route";
$WINNUKE="/usr/local/spamcontrol/winnuke"; (optional, used to crash Windows
systems that send spam - only use with discretion)
Here is a copy of my customized script:
#####################################################
# User-defined parts below #
#####################################################
$log = "/var/log/maillog";
# sendmail log location
$lastlines=1500;
# how many lines at the end of the log should we look at
$spamthreshold=200;
# this is how many mails can be seen from a single IP
# in the last $lastlines lines in the logfile before
# considering it spam. Adjust this to accomodate
# busy systems and events like coming up after a
# long downtime (when a lot of mail will be delivered
# from various hosts or from the secondary MX)
$dontblock="/usr/local/spamcontrol/dontblock";
# list of IP hosts that
# are never to be blocked
$blockactive="/usr/local/spamcontrol/blocked";
# these hosts are currently
# blocked by SpamShield
# for sysadmin review
$blockignore="/usr/local/spamcontrol/blockignore";
# be silent about these ones
$securetmp="/usr/local/spamcontrol";
# enter directory name that cannot be
# used by anyone except the uid under
# which this program is run
$blackhole="209.204.146.22";
# this **MUST** be an unused IP number on the
# local network, or error messages and chaos
# might ensure. undefine to not add a route,
# this should only be used on machines with
# known stable routing engines.
# who will receive alerts ? undefine to stop mail alerts
$maintainer="glenn\@networkinformation.com";
# define locations of programs below, systems vary
$SENDMAIL="/usr/sbin/sendmail";
$TAIL="/usr/bin/tail";
$AWK="/usr/bin/awk";
$GREP="/usr/bin/grep";
$SORT="/usr/bin/sort";
$CAT="/bin/cat";
$DATE="/bin/date";
$ROUTE="/sbin/route";
# $WINNUKE="/usr/local/spamcontrol/winnuke";
# define if retaliatory action desired -
# WARNING, use WINNUKE at your own risk!
#####################################################
# End of user-defined parts #
#####################################################
Run ./spamshield.pl as root by hand, note any and all errors
encountered (usually the result of mis-defined variables), then correct them.
Ensure that your variable paths are correct!
After running ./spamshield.pl for the first time, you should have the
following files under the directory /usr/local/spamcontrol:
blocked - current list of blocked sites, serves as log of past activity.
blockignore - list of IPs that are always ignored and never acted upon.
dontblock - list of IPs that are never blocked, but will cause spam alarms.
spamshield.pl - the program.
ss-ipstats - list of how many emails have been received from which IP host (after program has run).
ss-mailstats - list of every maillog line condensed into three space-separated parameters: IP number, number of recipients in this batch, and sender address used on From_ line. This makes for easy grepping and sorting for other purposes.
In order to correct any difficult errors, try increasing the DEBUG value.
For optimal performance, run the program automatically every three minutes from
cron, and set your system crontab to look something like this:
*/3 * * * /usr/local/spamcontrol/spamshield.pl
On some Unix systems, you need to redirect the output of cron to /dev/null to
avoid receiving emails to root each time the script is run. I typically add the
following to the end of each cron line to direct the output from runlevel 2 and 1 to
/dev/null:
*/3 * * * /usr/local/spamcontrol/spamshield.pl 2>/dev/null 1>/dev/null
SpamShield has taken a sensible approach to filtering spam.
Despite an array of products that claim to block spam mail, I have yet to find one that is 100 percent perfect. Most filters work to a degree, while others add yet another layer of inconvenience to the end user.
Simply put, SpamShield does what it was designed to do. As new versions evolve, I have confidence that this product will become ever more popular.
Log on to www.spamshield.org/ to read Kai's latest rants -- a little on Spam, a little on the rest of the world. And coming soon, version 2.0.
Glenn Graham has been working with telecommunications since 1977.
Return to ONLamp.com.
Copyright © 2009 O'Reilly Media, Inc.