Creating Custom Email Queries
Pages: 1, 2
Understanding the Job Ahead
If I've learned one thing, it's that you should never remove somebody else's code from an existing application. Adding new source code is a safer technique; I've found that I can still do what I want and avoid the monumental effort of understanding somebody else's work. It also makes it much easier to port my work to newer versions of the application.
You can do two things to help understand how DBmail works:
- Create an entity relationship diagram for the PostgreSQL database.
- Configure PostgreSQL to echo SQL commands to syslog and see what's going on.
An entity relationship diagram, or ERD, is a graphical representation of the database. An ERD typically shows the column names, their data types, and the relationships between the tables through the use of foreign keys. Figure 2 shows the ERD for the DBmail/PostgreSQL database.

Figure 2. Entity relationship diagram
for the email database imap_mail
The easiest way to obtain the ERD is to use the psql client to
read the relationships between the tables using the meta command
\d. Figure 3 shows a screenshot of the table description for
messages.

Figure 3. Using psql to obtain a table
description
I need to work with only a few of the DBmail tables: mailboxes,
messages, and messageblks.
PostgreSQL provides you the ability to record all SQL statements via the
syslog daemon. To invoke it, adjust the data cluster's configuration file
postgresql.conf by inserting the following lines:
syslog = 2
syslog_facility = 'LOCAL0'
syslog_ident = 'postgres'
log_statement = true
log_timestamp = true
The changes take effect once you've reloaded the service:
% /etc/init.d/postgresql reload
The DBmail error message levels are configured in the "dbmail.conf" file. Syslog places them in the mail log files ie:
/var/log/mail.warn/var/log/mail.err/var/log/mail.log
You can read the syslog files using tail and
grep:
% tail -n 100 /var/log/messages | grep postgres
The Hacks
The biggest challenge was understanding how DBmail worked; the rest was pretty easy.
Hack 1: Text Searching
This first hack searches all email for the text
roman. Once it finds a message that matches, the hack moves the
message from the INBOX to a subfolder named v_1.
The search begins when the user creates the subfolder v_1. The
IMAP server recieves the instruction to create the folder from the client and
sends the appropriate SQL to the back end. The trigger tr_v_1,
attached to the table mailboxes, executes the function
f_v_1() when an INSERT attempts to create the folder
INBOX/v_1. The text search begins here. A SQL CASE statement
tests each email in the INBOX. When the function finds an email with the
desired text, it calls another function, f_update_messages, to
update the messages table with an UPDATE to the message's location from
INBOX to INBOX/v_1.
The trigger tr_v_1 attaches the function f_v_1()
to the mailboxes table. The function will trigger on every INSERT
made to this table:
CREATE TRIGGER tr_v_1 AFTER INSERT ON mailboxes
FOR EACH ROW EXECUTE PROCEDURE f_v_1();
The function f_v_1() starts by testing the new name of the new
folder:
IF NEW.name = ''INBOX/v_1'' THEN
.
.
END IF;
If the name of the created folder is v_1
(INBOX/v_1), it performs a text search for the strings
roman and rome.
I've used the PERFORM statement instead of SELECT because I'm interested in
the execution of the function f_update_messages() rather than any
text string that the CASE statement may return. The CASE statement allows me to
test for the desired text string that's located in the messageblk
table by JOINing on the mailbox table:
PERFORM
CASE
WHEN messageblk ~* ''roman'' OR messageblk ~* ''rome''
THEN f_update_messages(NEW.mailbox_idnr,mblk.message_idnr)
ELSE ''other''
END AS "search results"
FROM messageblks mblk, messages m, mailboxes mb
WHERE
mblk.message_idnr = m.message_idnr AND
m.mailbox_idnr = mb.mailbox_idnr;
This calls the function f_update_messages, passing the mailbox
identity number and the messageblks identity number as parameters. It then
reassigns new identity numbers that effectively reassign the email's location
from INBOX to the newly created subfolder v_1:
UPDATE messages SET mailbox_idnr=new_mailbox_idnr
WHERE message_idnr=new_message_idnr;
Hack 2: Converting Word Attachments to Plain Text
This second hack converts all email with a Microsoft Word attachment to text and appends it to the original email.
This hack starts out the same way the first one does, as the attempt of the
DBmail server to INSERT data into the table messageblks triggers
it. DBmail updates the email record located in this table when the client moves
email from one folder to another.
The f_v_2 function performs multiple actions:
- Identifies whether a message has attachments
- Identifies whether the attachment is a Microsoft Word document
- Isolates and extracts the Word document from the email
- Converts it from its Base64 encoding to its binary form
- Converts the binary form to plain text
- Appends the plain-text transformation to the original email
The conversion operations use the command-line utilities
uudecode and antiword, which execute within the
plperlu function f_doc2txt(text).
The trigger tr_v_2 attaches the function f_v_2()
to the table messageblks. The function will execute for every
INSERT made to this table:
CREATE TRIGGER tr_v_2 BEFORE INSERT ON messageblks
FOR EACH ROW EXECUTE PROCEDURE f_v_2();
The function f_v_2() starts by identifying the mailbox's
folder name:
SELECT INTO mbx_name mailboxes.name
FROM mailboxes, messages
WHERE
mailboxes.mailbox_idnr=messages.mailbox_idnr AND
messages.message_idnr=NEW.message_idnr;
DBmail stores messages in two parts: the header and the message body.
This function identifies emails' Microsoft Word attachments while it moves them to
v_2. A positively identified message will fill in several
variables with information about all of the message's parts. This in essence
reconstructs the message:
SELECT INTO mail_header messageblk
FROM messageblks
WHERE messageblk_idnr=NEW.messageblk_idnr-1;
IF mail_header ~ ''multipart/mixed''
AND NEW.messageblk ~ content_type THEN
mime_boundary=trim(both ''"'' from
substring(mail_header from ''".*"''));
NEW.messageblk = f_get_document(NEW.messageblk,mime_boundary);
END IF;
The function f_get_document extracts the Microsoft Word MIME
attachment:
CREATE OR REPLACE FUNCTION f_get_document(text,text)
RETURNS TEXT AS'
DECLARE
mail_attachment ALIAS FOR $1;
mime_boundary ALIAS FOR $2;
mime_header TEXT;
str TEXT;BEGIN
mime_header = ''--''||mime_boundary||''\nContent-Type: text/plain; \
name=document.txt\n'';
mime_header = mime_header||''Content-Transfer-Encoding: 7bit\n'';
mime_header = mime_header||''Content-Disposition: inline; \
filename="document.txt"\n'';
str=substring(mail_attachment from \
''Content-Type: application/MS Word.*''||mime_boundary);
str=substring(str from ''\\n\\n.*\\n\\n'');
str=split_part(str,''\\n\\n'',2);
RETURN mime_header||f_doc2txt(str)||''--''||mime_boundary||''--'';
END;
'LANGUAGE plpgsql;
Notice that the function adds the header and MIME boundary to the Word
document, then turns the document into a text file by calling the function
f_doc2txt(). The Base64 encoding attachment, str, is
the function's parameter:
RETURN mime_header||f_doc2txt(str)||''--''||mime_boundary||''--'';
The function f_doc2txt() is short. It performs minor
formatting to prepare the attachment for the uudecode utility which turns it
back into its binary form. It then pipes the binary stream into
antiword, which converts it into a text stream:
CREATE OR REPLACE FUNCTION f_doc2txt(text) RETURNS TEXT AS'
use strict;
my $str="begin-base64 600 zzz\n".$_[0]."====";
$str= `echo "$str" | uudecode -o /dev/stdout | antiword - -i0`;
return $str;
' LANGUAGE plperlu;
The function returns this where it replaces the Word attachment with a
plain-text translation in the f_v_2() function:
NEW.messageblk = f_get_document(NEW.messageblk,mime_boundary);
.
.
.
RETURN NEW;
Tips
Read the SQL source code at the same time you read this article.
The DBmail documentation can help you configure mail agents such as Postfix to pipe email directly into the database.
DBmail is changing rapidly, so refer to the home site for the latest news.
Don't be bashful! Go to the PostgreSQL site to learn more about triggers and the amazing number of currently supported programming languages.
Conclusion
There's so much you can do to get the most out of your email. By using triggers in your database engine, you can enhance any existing behavior into new and creative solutions. All it takes is a little imagination.
Here are a couple of ideas to whet your appetite:
- Sort and cross-index your email under several categories or searches, and place them in several folders at the same time.
- Translate email into another language by piping your correspondence through an online translation facility.
- Create and index your email, similar to Google, using PostgreSQL's tsearch advanced text-searching capabilities.
I began this article intending to highlight data-mining strategies, but as you can see it's hard to stick to this subject without also considering reverse-engineering techniques.
References
dbmailman pages, specifically:man dbmail-adduser, which adds mail users to the Dbmail mail systemman dbmail-imapd, the imap4r1 front end for the Dbmail mail systemman dbmail-maintenance, which performs cleanup and integrity checksman dbmail-pop3d, the POP3d front end for the Dbmail mail systemman dbmail-smtp, which inserts messages into the Dbmail mail system
man uudecodeman antiword- DBmail
- PostgreSQL
- RFC 3501, the IMAP protocol
Robert Bernier is the PostgreSQL business intelligence analyst for SRA America, a subsidiary of Software Research America (SRA).
Return to ONLamp.com
You must be logged in to the O'Reilly Network to post a talkback.




