Spam Filtering with Sendmail Milters and Greylisting
by Emmanuel Dreyfus06/10/2004
In the first part of this series, we studied the various spam filtering techniques; specifically, in which place of the electronic mail framework filtering measures work and what kind of filtering techniques are currently available.
This article focuses on the development of a spam filter, through the
example of milter-greylist, a greylisting plugin for Sendmail. We assume that
the reader knows the C programming language reasonably well. A basic
understanding of TCP/IP is also useful.
Sendmail and Milter
Sendmail made MTA-level filtering easy by introducing the Milter API. Milter
is a contraction of the term "mail filter." Milters are small daemons that
communicate with Sendmail through UNIX sockets or TCP/IP connections. They are
easy to configure; you just need to add a few lines to the
sendmail.cf configuration file. Here is an example for double
filtering by milter-regex and milter-greylist:
O InputMailFilters=regex,greylist
Xregex, S=local:/var/run/milter-regex/sock, F=T
Xgreylist, S=local:/var/milter-greylist/sock F=T
O Milter.macros.connect=j, _, {daemon_name}, {if_name}, {if_addr}, {client_addr}
O Milter.macros.envfrom=i, {mail_mailer}, {mail_host}, {mail_addr}
O Milter.macros.envrcpt={rcpt_mailer}, {rcpt_host}, {rcpt_addr}
The first line lists the milters to invoke for each message. Here,
filtering first uses regex, then greylist. Those names must correspond to the
next lines, which start with an X.
The X lines define each milter property: how to contact the milter (here, a
local UNIX socket) and what should happen if the milter fails.
(F=T means a temporary error, F=R means a permanent
error, and no F= means pass through as if the filter did not
exist.) Timeout values are optional.
The remaining lines select which Sendmail macros to export to the milter. We will see how to use them when we deal with the actual implementation.
The milter design allows them to run on the same machine as Sendmail, but also through the network. It is possible to build highly scalable setups, with farms of milter machines and load distributed though rotating DNS or TCP redirection.
Milter Gallery
Many milters are already available for anti-spam, anti-virus, archival, accounting, and various other purposes. Here is a set of my favorites:
-
milter-regexfilters mail by applying regular expressions. It can filter out files based on headers (the Win32 header, for instance) or by extension. Here is a sample of amilter-regexconfig file:reject "Sorry, we do not accept ZIP archives anymore" body /^(Content-Type: [^;]*; | )name=".*\.zip"/ie body /^(Content-Disposition: attachment; | )filename=".*\.zip"/ieIt is also extremely useful when dealing with distributed denial-of-service attacks. If you can find a common pattern in the junk messages, you can filter them out with
milter-regex. milter-greylistis an anti-spam tool I wrote. It uses the greylist method, and for now, it just zaps all of the spam without a false positive.The principle is simple: on temporary errors, real MTAs wait for a while and retry sending the message. Spam engines do not. When
milter-greylistreceives a message, it refuses it with a temporary error, storing a tuple (source IP, sender email, recipient email) in a table. On the next attempt, if it finds the tuple in the table, it accepts the message.Of course, spammers can start resending their messages. If this happens some day, we can force each message to wait for one hour before being accepted. If the spammer stays at the same address for one hour, the odds are good he will appear in a DNS-based blacklist before the second attempt.
White-listing and auto-white-listing can also reduce the delay on legitimate mail.
milter-senderis a real-time, sender-address validator. It works by trying to send a message to the sender address of each incoming message. If it receives a temporary error, it temporarily refuses the incoming message. If it receives a permanent error, it refuses the incoming message permanently, and so on.j-chkmailchecks the message for forbidden attachment files and will refuse them. It is very useful against viruses, and risks fewer false positives than the one-line regular expression matching done bymilter-regex.
There are also various milters to interface Sendmail with AMaViS, SpamAssassin, and many other tools. Web sites such as milter.org feature lists of available milters.
Writing Your Own Milter
Milters are linked with libmilter, which handles the burden of the
communication with Sendmail. Milter authors just have to use the Milter API,
by including <libmilter/mfapi.h> and by linking with
libmilter. Because libmilter relies on libpthread, libpthread is required in
milter linkage as well.
Starting Up
Writing a milter tends to be surprisingly simple. Start by writing a daemon
that will parse its command-line options, detach to the background, open log
files, and so on. In order to specify the socket that will be used to
communicate with Sendmail, use smfi_setconn():
smfi_setconn(socket)
where socket is a string, usually taken from the command line,
that identifies the location of the socket. For a local socket, you can just
use a filesystem path.
The other required operation is to fill a struct, smfiDesc, with
a collection of callbacks and pass it to libmilter through
smfi_register():
struct smfiDesc smfilter =
{
"greylist", /* filter name */
SMFI_VERSION, /* version code */
SMFIF_ADDHDRS, /* flags */
mlfi_connect, /* connection info filter */
NULL, /* SMTP HELO command filter */
mlfi_envfrom, /* envelope sender filter */
mlfi_envrcpt, /* envelope recipient filter */
NULL, /* header filter */
NULL, /* end of header */
NULL, /* body block filter */
mlfi_eom, /* end of message */
NULL, /* message aborted */
mlfi_close, /* connection cleanup */
};
/* (some code) */
if (smfi_register(smfilter) == MI_FAILURE) {
fprintf(stderr, "%s: smfi_register failed\n", argv[0]);
exit(EX_UNAVAILABLE);
}
Once this is done, the program hands out control to libmilter forever by
calling smfi_main():
return smfi_main();
Pages: 1, 2 |



