ONLamp.com    
 Published on ONLamp.com (http://www.onlamp.com/)
 See this if you're having trouble printing code examples


Building the PostgreSQL BuildFarm

by Andrew Dunstan
02/24/2005

PostgreSQL is a high-quality, full-featured open source database that runs on a huge number of platforms. The number of possible combinations of machine architecture, operating system, system libraries, compilers, and configuration sets is truly staggering. It also has just increased significantly with the recent release of the 8.0 version, which supports Windows natively for the first time. One of the problems that the PostgreSQL project faces, as many other similar projects do, is how to know whether some change has broken things on some platform. We don't have the resources to run every possible combination, nor even a tiny proportion of them. On several occasions it has happened that breakage only became apparent some time after a change went in. We created PostgreSQL BuildFarm to address that difficulty.

What's a Build Farm?

Because there was no way to create a testing laboratory with all the requisite hardware and software for the job, I decided instead to take a leaf out of the book of the Samba project, which has similar needs, and leverage the resources of some of our users and developers. The Samba Build Farm has operated for quite a while, and with good success. Our build farm has two parts: a central server that collects and processes build results, and member machines spread out across the world that run builds and report the results.

I started out building the client with two files, a script file and a configuration file. The script is written in Perl. There are several reasons for that. One is that I write a lot of Perl and I'm quite comfortable doing so. Another is that PostgreSQL uses Perl to build when pulling the source from its CVS repository, so I knew (or thought I knew) that any platform that could build PostgreSQL could run the build farm client. Because the script is Perl, it seemed easiest to make the config file Perl too. Essentially it constructs a Perl hash that contains all the configuration data--like where to get the source code and how, the credentials for the particular farm member, and so on.

The process runs something like this:

If any step fails, further processing stops and the program reports the failure along with all the logs relevant to the failed step.

The script has many options, most of which are useful only in debugging your setup. In normal use, it usually requires only the CVS branch name to test as a command-line parameter.

Related Reading

SQL in a Nutshell
A Desktop Quick Reference
By Kevin Kline

What's It Take?

The bandwidth requirements are quite modest. I have found that the best way by far is to run my own CVS mirror of the PostgreSQL sources, using the fine CVSup package. This took me an hour or two to set up, but it made checkouts lightning-fast. On the mirror machine, I sync with the PostgreSQL server every 15 minutes via a cron job. I have not had the slightest trouble with it since I set this up, and I have four client machines on my home network that all use the mirror--in one case locally and in the other three cases by way of a pserver I set up on the mirror machine.

The other useful efficiency gain I found was to use the Samba project's wonderful ccache package. Some of my machines are old and slow. In one case I found that using ccache reduced the time for a build farm script run from around an hour to about 15 minutes. This application is really a poster child for ccache.

While you don't need lots of bandwidth or processing power to run the build farm, you do need lots of disk space. With four branches supported, one of my members uses 433MB with no builds running. Each concurrent build can take another several hundred megabytes, although this is a requirement only during a run. Given the current price of disk storage, requiring lots of disk space for the build farm seems reasonable. If disk space is at a premium in a given system, you can reduce the requirements by not using ccache and not running more than one build at a time. Even then, usage can climb transiently to around 500MB for a single branch.

How Does It Work with Users?

Along the way to creating the build farm client, I encountered a few pitfalls. For example, having two instances running for the same branch using the same build root directory is not a good idea--the design doesn't really allow for that. The solution is to use a classic Unix-style lock file. If the script cannot obtain an exclusive lock on the file (in NOWAIT mode), it exits immediately. Another was that CVS operations can end up hanging if the CVS server has problems. The script therefore has optional code to force a time-out on CVS operations.

Once I had a client that would build and report its results (at that stage, it reported to the screen rather than to a server), I started to create the server. Obviously it needed a data store, and naturally it uses PostgreSQL. Then I had to decide on a transport mechanism. I decided on a simple HTTP/CGI transaction for several reasons. First, even when machines are behind firewalls, they most often have the ability to make outbound HTTP calls on port 80, even if everything else is blocked. Second, using CGI rather than a more structured system like SOAP meant that I needed fewer libraries on the client side. I (somewhat naively) thought that all modern Perl installations would have the LWP library and its friends. I constructed a very simple datastore consisting of two tables and a simple CGI script to accept results from a member and store them in the database. The CGI script is Perl, using the very standard CGI module to parse the query and the DBI and DBD::Pg modules to talk to the database.

One early design decision was that not just anyone could join the build farm. People would have to apply and receive approval. There are several reasons for this. It is important that members be reachable in case of errors. If your platform is having problems, we want you to be active in fixing it, as you might have the only such platform we know of, or indeed in existence. Preferably, you will already be a member of the PostgreSQL community and be known to me or someone else in that community, but I have accepted a few members I don't know. Lastly, we want coverage and diversity rather than mere numbers. There is no virtue in our having 100 identical FooLinux members--all it means is that we have more information to wade through. So you have to register, and later you will receive by email or other means a name and a secret. The name is an alias to identify you (all are animal names), and the secret is a signing key for any results you send to the server. Each request is signed with an SHA1 digest of the combination of the request and the secret. The server performs the same calculation and discards the results if the two don't match.

After a little experimentation, the database looked like this:


     Table "public.buildsystems"
      Column      | Type | Modifiers 
------------------+------+-----------
 name             | text | not null
 secret           | text | not null
 operating_system | text | not null
 os_version       | text | not null
 compiler         | text | not null
 compiler_version | text | not null
 architecture     | text | not null
 status           | text | not null
 sys_owner        | text | not null
 owner_email      | text | not null
Indexes:
    "buildsystems_pkey" primary key, btree (name)

                   Table "public.build_status"
        Column         |            Type             | Modifiers 
-----------------------+-----------------------------+-----------
 sysname               | text                        | not null
 snapshot              | timestamp without time zone | not null
 status                | integer                     | 
 stage                 | text                        | 
 log                   | text                        | 
 conf_sum              | text                        | 
 branch                | text                        | 
 changed_this_run      | text                        | 
 changed_since_success | text                        | 
Indexes:
    "build_status_pkey" primary key, btree (sysname, snapshot)
    "bs_branch_snapshot_idx" btree (branch, snapshot)
    "bs_sysname_branch_idx" btree (sysname, branch)
Foreign-key constraints:
    "bs_fk" FOREIGN KEY (sysname) REFERENCES buildsystems(name) 
	     ON UPDATE CASCADE ON DELETE CASCADE

The text type is an (almost) arbitrarily large character type in PostgreSQL and several other databases. I chose this type for most of the fields so as not to restrict the data unduly while learning exactly what sorts of data we would need. It is still the early days of the build farm. My intention is to standardize some of the fields in due course, especially operating_system, os_version, compiler, compiler_version, and architecture. The status field in build_status has proved fairly useless--it is the exit code returned by the stage that was last run. At some stage, we'll drop it. The snapshot is the time at which the program took a CVS snapshot of the branch in question. The conf_sum and especially log fields can grow quite large. They contain the log for the stage that failed, if any, and the configuration settings and detected machine information for the build farm run.

Once I had the database in place and a script to add status reports to the database via a web transaction, I modified the client code to call the web transaction instead of reporting to the screen. Then I needed two more pieces: registration and display of status. Registration required one small static HTML page with a form and a tiny CGI script. Each person's application is put in the database with a status of pending, and they cannot upload results until they receive a name and secret. Then the status changes to approved. I currently do this manually by logging in to the database and calling a stored procedure. Lastly, there are four pages that display the results. Three are summaries, and one displays all the information about a particular build. They are all written in Perl and all use Andy Wardley's superb Template Toolkit package for handling the presentation. Michael Glaesemann then very kindly designed a nice skin for the site for me--and that brings us to the state the build farm is in today.

How Do You Use It?

The client code is available at pgFoundry, the PostgreSQL community development site, as is the CVS repository, so anyone can download it and play around. You don't need to be an approved member; there's a special nosend mode that does everything except actually upload results, and a couple of levels of verbosity that let you see what's happening.

The client code, by design, runs unattended from cron. Here are the relevant cron entries from one of my machines:

25 * * * * cd /home/andrew/pgbf && ./run_build.pl
55 1 * * * cd /home/andrew/pgbf && ./run_build.pl REL8_0_STABLE
55 3 * * * cd /home/andrew/pgbf && ./run_build.pl REL7_4_STABLE
55 5 * * * cd /home/andrew/pgbf && ./run_build.pl REL7_3_STABLE
55 7 * * * cd /home/andrew/pgbf && ./run_build.pl REL7_2_STABLE

This means I try to build the HEAD branch (the main development branch) from CVS every hour if there have been changes. I check stable branches once a day.

One of the biggest and most frustrating challenges has been to make the client work on Windows. The only approved build environment for the new Windows port is the package sets from the Mingw project. These do some very cool things but have posed a couple of problems for build farms. The biggest problem is that it involves a virtual environment, in which you can refer to Unix-like paths such as /usr/local that do not in fact exist. That is fine while you stay within the environment, but step outside it and try to use them, and you'll have trouble. The Mingw DTK does have a version of Perl, so you can run it from inside Mingw. Unfortunately, that Perl does not come with the packages required to do the web transaction. It can do all the rest of the build farm process just fine, but not that part. My attempts to add those modules onto the Mingw Perl failed miserably, as did my attempts to convince a non-Mingw Perl to run the pre-web transaction part of the build farm. The Solomonic solution has been to split the script in two. The Mingw Perl runs the build, collects the results, and calls the Perl Data::Dumper module to serialize the data, which it prints to a file. It then calls another script that reads that file, calls Perl's eval on the contents, and then performs the web transaction. The second script runs standard Windows Perl and doesn't need to know anything of the Mingw paths. This solution is now in the latest release of the client code. The code for Unix is identical; you just use the same Perl interpreter for both pieces.

What's Next?

There is plenty more work to go. Things under consideration include:

We also need to broaden the coverage of the build farm, now that it has proved to be viable.

The PostgreSQL BuildFarm is a classic case of distributed computing. We have a task that needs to occur over and on a large number of different platforms. Rather than try to assemble them all, we use instances of such platforms in situ, and their owners devote a few spare computing cycles to the task. It demonstrates the power of the open source community. It also provides a practical way that many users can contribute back to the project, even if they are not developers.

The PostgreSQL BuildFarm server runs on a equipment and bandwidth provided by the fine people at CommandPrompt, for which they have my thanks and those of the PostgreSQL community.

Andrew Dunstan works for a small consulting and software company in the Triangle area of North Carolina, and contributes to PostgreSQL as an enthusiastic hobbyist as well as a sometime professional user.


Return to ONLamp.com.

Copyright © 2009 O'Reilly Media, Inc.