ONLamp.com    
 Published on ONLamp.com (http://www.onlamp.com/)
 See this if you're having trouble printing code examples


Spam Filtering with Sendmail Milters and Greylisting

by Emmanuel Dreyfus
06/10/2004

In the first part of this series, we studied the various spam filtering techniques; specifically, in which place of the electronic mail framework filtering measures work and what kind of filtering techniques are currently available.

This article focuses on the development of a spam filter, through the example of milter-greylist, a greylisting plugin for Sendmail. We assume that the reader knows the C programming language reasonably well. A basic understanding of TCP/IP is also useful.

Sendmail and Milter

Sendmail made MTA-level filtering easy by introducing the Milter API. Milter is a contraction of the term "mail filter." Milters are small daemons that communicate with Sendmail through UNIX sockets or TCP/IP connections. They are easy to configure; you just need to add a few lines to the sendmail.cf configuration file. Here is an example for double filtering by milter-regex and milter-greylist:

O InputMailFilters=regex,greylist
Xregex, S=local:/var/run/milter-regex/sock, F=T
Xgreylist, S=local:/var/milter-greylist/sock F=T

O Milter.macros.connect=j, _, {daemon_name}, {if_name}, {if_addr}, {client_addr}
O Milter.macros.envfrom=i, {mail_mailer}, {mail_host}, {mail_addr}
O Milter.macros.envrcpt={rcpt_mailer}, {rcpt_host}, {rcpt_addr}

The first line lists the milters to invoke for each message. Here, filtering first uses regex, then greylist. Those names must correspond to the next lines, which start with an X.

The X lines define each milter property: how to contact the milter (here, a local UNIX socket) and what should happen if the milter fails. (F=T means a temporary error, F=R means a permanent error, and no F= means pass through as if the filter did not exist.) Timeout values are optional.

The remaining lines select which Sendmail macros to export to the milter. We will see how to use them when we deal with the actual implementation.

The milter design allows them to run on the same machine as Sendmail, but also through the network. It is possible to build highly scalable setups, with farms of milter machines and load distributed though rotating DNS or TCP redirection.

Milter Gallery

Many milters are already available for anti-spam, anti-virus, archival, accounting, and various other purposes. Here is a set of my favorites:

There are also various milters to interface Sendmail with AMaViS, SpamAssassin, and many other tools. Web sites such as milter.org feature lists of available milters.

Writing Your Own Milter

Milters are linked with libmilter, which handles the burden of the communication with Sendmail. Milter authors just have to use the Milter API, by including <libmilter/mfapi.h> and by linking with libmilter. Because libmilter relies on libpthread, libpthread is required in milter linkage as well.

Starting Up

Writing a milter tends to be surprisingly simple. Start by writing a daemon that will parse its command-line options, detach to the background, open log files, and so on. In order to specify the socket that will be used to communicate with Sendmail, use smfi_setconn():

smfi_setconn(socket)

where socket is a string, usually taken from the command line, that identifies the location of the socket. For a local socket, you can just use a filesystem path.

The other required operation is to fill a struct, smfiDesc, with a collection of callbacks and pass it to libmilter through smfi_register():

struct smfiDesc smfilter =
{
	"greylist",     /* filter name */
	SMFI_VERSION,   /* version code */
	SMFIF_ADDHDRS,  /* flags */
	mlfi_connect,   /* connection info filter */
	NULL,           /* SMTP HELO command filter */
	mlfi_envfrom,   /* envelope sender filter */
	mlfi_envrcpt,   /* envelope recipient filter */
	NULL,           /* header filter */
	NULL,           /* end of header */
	NULL,           /* body block filter */
	mlfi_eom,       /* end of message */
	NULL,           /* message aborted */ 
	mlfi_close,     /* connection cleanup */
}; 

/* (some code) */

if (smfi_register(smfilter) == MI_FAILURE) {
	fprintf(stderr, "%s: smfi_register failed\n", argv[0]);
	exit(EX_UNAVAILABLE);
}

Once this is done, the program hands out control to libmilter forever by calling smfi_main():

return smfi_main();

Callbacks

Now, every time the server handles an email, libmilter will call one of the callbacks we registered through smfi_register(). For instance, in this example, the mlfi_connect() callback registers for connection time. Therefore, each time an SMTP client connects to the machine, libmilter will invoke our mlfi_connect() function.

Here is the mlfi_connect() function for milter-greylist:

sfsistat
mlfi_connect(ctx, hostname, addr)
	SMFICTX *ctx;
	char *hostname;
	_SOCK_ADDR *addr;
{       
	struct mlfi_priv *priv;
	struct sockaddr_in *addr_in;

	if ((priv = malloc(sizeof(*priv))) == NULL)
		return SMFIS_TEMPFAIL;
		       
	smfi_setpriv(ctx, priv);
	bzero((void *)priv, sizeof(*priv));
	priv->priv_whitelist = EXF_UNSET;
		
	addr_in = (struct sockaddr_in *)addr;
	if ((addr_in != NULL) && (addr_in->sin_family == AF_INET))
		priv->priv_addr.s_addr = addr_in->sin_addr.s_addr;
		       
	return SMFIS_CONTINUE;
}

We have an opaque context pointer that libmilter will hand us on each callback for the same SMTP connection. libmilter uses it to store various pieces of information about the connection, including a user private pointer that we can use to store our own data. smfi_setpriv() and smfi_getpriv() set and retrieve this private pointer, respectively.

milter-greylist's mlfi_connect() starts by allocating some private memory for a mlfi_priv structure, which is defined like this:

struct mlfi_priv {
	struct in_addr priv_addr;
	char priv_from[ADDRLEN + 1];
	time_t priv_elapsed;
	int priv_whitelist;
	char *priv_queueid;
};

Our goal is to retrieve the tuple (source IP, sender email, recipient email), so mlfi_priv has some storage for this information. In mlfi_connect(), we store the client IP address in the priv_addr field of mlfi_priv.

Anatomy of an SMTP Transaction

Before moving further, let us look at the anatomy of a SMTP transaction. Lines starting with >>> are sent from the client to the server, and lines starting with <<< are sent from the server to the client.

>>> 220 mx1.example.net ESMTP Sendmail 8.12.10/jtpda-5.4 ready at Fri, 26 Mar 2004 15:23:56 +0100 (CET)
<<< HELO mail.example.com
>>> 250 mx1.example.net Hello mail.example.com [192.0.2.26], pleased to meet you
<<< MAIL FROM: <John.Smith@example.com>
>>> 250 2.1.0 <John.Smith@example.com>... Sender ok
<<< RCPT TO: <Reginald.Wesson@example.net>
>>> 250 2.1.5 <Reginald.Wesson@example.net>... Recipient ok
>>> DATA
<<< 354 Enter mail, end with "." on a line by itself
>>> From: <John.Smith@example.com>
>>> To: <Reginald.Wesson@example.net>
>>> Date: Fri, 26 Mar 2004 15:23:57 +0100 (CET)
>>> Subject: Test
>>>
>>> This is a test message
>>> .
<<< 250 2.0.0 i2QENuV9026193 Message accepted for delivery
>>> QUIT
<<< 221 2.0.0 mx1.example.net closing connection

More Callbacks

After smfi_connect(), libmilter will invoke the following callbacks:

Additionally, the following checkpoints could have callbacks, if we had registered them:

The Milter API documents all of the possible callbacks. In each of the callbacks, it is possible to call smfi_getpriv() to fetch the pointer to our private data, so we can read and modify it.

Accepting or Rejecting

In each callback, the return value can cause Sendmail to reject the message either permanently (SMFIS_REJECT) or temporarily (SMFIS_TEMPFAIL). Returning SMFIS_CONTINUE carries on the transaction.

Depending on the callback, rejecting can have different meanings. For example, mlfi_rcpt() is recipient-oriented. It can be called several times for a message that has several recipients. Rejecting one recipient will remove that recipient from the recipient list, but the message will still go through for the other ones.

In message-oriented callbacks, such as mlfi_eom(), rejecting causes the message to be rejected for all of the recipients.

Cleaning Up After a Message is Handled

Whatever happens to the message, the mlfi_close() callback will be called. This is the place to de-allocate private data. Failure to do so will cause a memory leak that will eventually crash the milter:

sfsistat
mlfi_close(ctx)
	SMFICTX *ctx;
{        
	struct mlfi_priv *priv;

	if ((priv = (struct mlfi_priv *) smfi_getpriv(ctx)) != NULL) {
		free(priv);
		smfi_setpriv(ctx, NULL);
	}

	return SMFIS_CONTINUE;
}

Multi-Threading

We complete our tuple in the mlfi_envrcpt() callback. We already have the source IP and the sender email stored in mlfi_priv(), and now we finally receive one recipient address.

This is the time for various checks, such as the whitelist check that milter-greylist's except_filter() function performs. This function is worth a few words. It walks a chained list of exceptions, looking for an entry matching the recipient address or the source IP:

LIST_FOREACH(ex, &except_head, e_list) {
	if (ex->e_type != E_RCPT)
		continue;

	if (emailcmp(rcpt, ex->e_rcpt) == 0) {
		found = 1;
		break;
	}
}

The LIST_FOREACH macro comes from <sys/queue.h>, along with a few other macros for defining and walking different kinds of chained lists. Theses macros are extremely useful, since they greatly reduce your ability to write bugs in chained-list code.

Whether you use chained lists or fixed size tables, it's impossible to read and write the data shared among threads in a milter, because the code runs in a multi-threaded environment. Each time Sendmail handles a new message, it will make a new connection to the milter, where libmilter spawns a new thread to handle it. The milter may be processing several messages simultaneously.

It is therefore not safe to operate on shared data; another thread might be writing while we read, thus causing bugs. For instance, if we walk a chained list while another thread removes an item from it, we might jump out of the list and crash.

The workaround is locking. Each time we need to read some global data, we use a read lock. Each time we write to it, we use a write lock. The difference between read locks and write locks is that many threads can share a read lock, whereas only one thread can have a write lock.

In milter-greylist, we use lock macros to avoid bloating the code:

#define WRLOCK(lock) if (pthread_rwlock_wrlock(&(lock)) != 0) {           \
                syslog(LOG_ERR, "%s:%d pthread_rwlock_wrlock failed: %s", \
                    __FILE__, __LINE__, strerror(errno));                 \
                exit(EX_SOFTWARE);                                        \
        }

Before using the lock, it must be initialized. Do this by using pthread_rwlock_init() before calling smfi_main().

There are many other problems caused by multi-threading. For instance, milter-greylist has to write its database to a file when it is modified, so that after a restart it can resume operation where it halted. It is not possible to dump the database to a file from a callback, because another thread could attempt to do this at the same time. To work around this problem, dump.c devotes a single dumper thread to this operation. This thread starts (using pthread_create() from main()) before the smfi_main() call.

The dumper thread sleeps on a flag, using pthread_cond_wait(). Each time another thread modifies the database, it wakes the dumper thread by calling pthread_cond_signal(), and the dumper thread handles the job of flushing data to disk.

Thread Safety and Third-Party Code

Last but not least, a milter must only call thread-safe functions from libraries. Any function that uses global variables or static memory is thread-unsafe. For instance, you have to use inet_ntop(3) instead of inet_ntoa(3).

Thread unsafety can be hard to guess. For instance, if your libc features a BIND4-based DNS resolver, using DNS resolver functions will lead to trouble. This kind of problem can be quite hard to discover, especially when linking with third-party libraries.

Fortunately, this kind of problem is easy to track down. After receiving a few messages, the milter will hang. At that time, if you attach gdb(1) to it and type the bt command (this shows the stack dump), you will always see it stuck in the same code path. This code path is likely to contain a thread-unsafe function. Here is an example:

# ps -ax | grep milter-greylist
13694 ?? S     0:00.13 milter-greylist -p /var/milter-greylist/sock
# gdb milter-greylist
(gdb) attach 13694
0x4193f238 in recvfrom () from /usr/lib/libc.so.12
(gdb) bt
#0  0x4193f238 in recvfrom () from /usr/lib/libc.so.12
#1  0x418a43c0 in __pth_sc_recvfrom () from /usr/pkg/lib/libpthread.so.20
#2  0x418a2cfc in pth_recvfrom_ev () from /usr/pkg/lib/libpthread.so.20
#3  0x418a2a7c in pth_recv_ev () from /usr/pkg/lib/libpthread.so.20
#4  0x418a2a50 in pth_recv () from /usr/pkg/lib/libpthread.so.20
#5  0x418a4444 in recv () from /usr/pkg/lib/libpthread.so.20
#6  0x418a10fc in pth_poll_ev () from /usr/pkg/lib/libpthread.so.20
#7  0x418a0d44 in pth_poll () from /usr/pkg/lib/libpthread.so.20
#8  0x418a3d24 in poll () from /usr/pkg/lib/libpthread.so.20
#9  0x418799fc in res_send () from /usr/lib/libresolv.so.1
#10 0x41877ef4 in res_query () from /usr/lib/libresolv.so.1
#11 0x4184377c in SPF_dns_lookup_resolv (spfdcid=0x190caa0, 
    domain=0x182b290 "example.com", rr_type=16, should_cache=1)
    at spf_dns_resolv.c:139
#12 0x4183fc64 in SPF_dns_lookup (spfdcid=0x0, domain=0x1a1df98 "", 
    rr_type=64, should_cache=2) at spf_dns.c:57
#13 0x4184291c in SPF_get_spf (spfcid=0x1987c00, spfdcid=0x190caa0, 
    domain=0x182b290 "example.com", c_results=0x1a1f8c8) at spf_get_spf.c:76
#14 0x418423ec in SPF_result (spfcid=0x1987c00, spfdcid=0x190caa0, domain=0x0)
    at spf_result.c:376
#15 0x180b0e4 in spf_alt_check (in=0x0, 
    fromp=0x190c940 "<John.Doe@example.com>") at spf.c:126
#16 0x18022ec in mlfi_envfrom (ctx=0x0, envfrom=0x182b250)
    at milter-greylist.c:178
#17 0x180e820 in st_sender ()
#18 0x180de14 in mi_engine ()
#19 0x180c4dc in mi_handle_session ()
#20 0x180bd50 in mi_thread_handle_wrapper ()
#21 0x4189bf7c in pth_spawn_trampoline () from /usr/pkg/lib/libpthread.so.20
#22 0x41898990 in pth_mctx_set_bootstrap () from /usr/pkg/lib/libpthread.so.20
#23 0x418988dc in pth_mctx_set_trampoline () from /usr/pkg/lib/libpthread.so.20
#24 0x7fffefdc in ?? ()

Note that if, when typing bt, you see no function name, make sure the program was built with -g and that the binary was not stripped at installation.

The last function invoked before the libpthread machinery is res_send(3). A quick search on the Internet tells that this function is not thread-safe in BIND4, which is what causes the problem. You must use a BIND8 resolver to work around this problem.

Reading Macros

From time to time, it is necessary to read some of Sendmail's macros by using smfi_getsymval(). This is how, for example, smfi_envrcpt() reads the message queue ID:

if ((priv->priv_queueid = smfi_getsymval(ctx, "{i}")) == NULL) {
        syslog(LOG_DEBUG, "smfi_getsymval failed for {i}: %s", 
            strerror(errno));
        priv->priv_queueid = "(unknown id)"; 
}

This can read only macros explicitly exported in sendmail.cf using the O Milter.macros configuration lines.

Changing Headers

In order to make debugging easier, milter-greylist adds an X-Greylist header to any handled message that explains if the message was delayed and how much, if the message is white-listed and why, and so on. smfi_addheader() in smfi_envrcpt() handles this. This function takes the opaque pointer, the header name, and the header value as arguments.

Conclusion

Milter is a scalable, easy-to-use solution for MTA-level filtering. The API is quite straightforward to use and hides very few pitfalls. It's easy to start and to develop complex filtering techniques. It is indeed a great opportunity to have it in the battle against spam and viruses.

milter-greylist was really easy to implement. It took under a week to produce something that works (with a few bugs), and less than a month to complete version 1.0. I hope this article will help potential developers to produce more milters.

Thanks to John Klos for reviewing this article.

Emmanuel Dreyfus is a system and network administrator in Paris, France, and is currently a developer for NetBSD.


Return to ONLamp.com.

Copyright © 2009 O'Reilly Media, Inc.