Now there's an open-source solution for this simple clustering configuration. Continuent is a database-independent project that handles clustering and provides simple management interfaces.
I talked last week to Continuent spokesperson Emannuel Cecchet. The project has employed eight engineers since January of this year and is funded by Emic Networks, a long-time provider of clusters for MySQL. The code is released under the Apache Public License.
The idea behind Continuent is that you can simply run its basic software, known as Sequoia, on two or more systems that host databases and have it handle your clustering. Any query directed to one system is automatically broadcast to the others.
Sequoia handles transaction scheduling, and allows all systems to be updated aynchronously at the speed of the fastest node. Sequoia also ensures that the user always gets data from a fresh copy where all updates have been applied. Failover is accomplished automatically.
The group communications behind Sequoia replication is based on a component called Hedera that allows developers to plug in various implementations. Hedera currently comes with the popular JGroups group communication library.
The broadcasting is more coarse-grained than the scheduling that databases do on their own, but it ensures that the software is database-independent and requires no special hooks into the databases. It has proven efficient enough for moderately heavy database use, particularly in read-heavy applications (about 80% reads) that are the norm. But it also scales well with heavier write workloads.
Continuent grew out of a project called c-jdbc, which was hosted at the ObjectWeb Consortium and proved quite popular with 50,000 downloads. As the name suggests, the project is written in Java and started with a Java interface. It is now expanding to offer a C++ interface (called Carob) and to replace its cumbersome ODBC-to-JDBC bridge with a native ODBC implementation.
Management is through an Eclipse plug-in named Oak. The team hopes to work with the Eclipse database tools project to do further integration.
While Sequoia is usually employed with homogeneous database instances, some sites find it useful to help them migrate to new versions of a database. New versions can be dynamically and transparently added to the cluster while the administrators work out kinks.
A few intrepid sites have also mixed databases from different vendors. For instance, if they consider it necessary to do sensitive and mission-critical work on Oracle, they may create a cluster with the critical data on Oracle and less critical data (such as static content) on MySQL. Different tables can be stored on different cluster nodes, and Sequoia directs queries to the appropriate node.
Although c-jdbc was originally released under the LGPL, its team found that the APL was more suited to this project. This is mainly because the main interface and library are Java, and it's unclear how to apply the LGPL to Java code. Cecchet said the team sensed that many potential contributors were keeping all their code proprietary because they could not be sure how to split it between free and proprietary components.
Andy Oram is an editor for O'Reilly Media, specializing in Linux and free software books, and a member of Computer Professionals for Social Responsibility. His web site is www.praxagora.com/andyo.
oreillynet.com Copyright © 2006 O'Reilly Media, Inc.