Preserving Backward Compatibility
Pages: 1, 2
Planning ahead
The best thing you can do to ensure that you maintain protocol and data format compatibility is to plan ahead by designing your protocols and data formats so that you can add things to them in the future without disturbing prior versions of the code. This means you need to be able to add new elements to your files or data streams that your code can ignore if it doesn't understand them and that new code needs to be able to deal with the absence of the new elements.
Use of XML
The canonical place that Subversion uses this technique is within the XML
data formats used in the working copy libraries (for example, the
.svn/entries files) and the DAV-based network protocol used by
libsvn_ra_dav and mod_dav_svn. I'm not the biggest
fan of XML, but it does make it pretty simple to create formats and protocols
that can be extended later with a minimum of problems.
Specifically, the use of XML in libsvn_ra_dav has simplified
the process of adding parameters to functions in the repository access API. For
example, when I added the --limit parameter to the svn
log command, I had to transmit that parameter to the server so it could
pass on to the libsvn_repos-level log functions. Because the
functions in question simply send a report to the server in XML form, all that was required was to add a new element containing the parameter. New servers simply
look for the new element, and if it isn't there, they assume it wasn't sent,
preserving compatibility with old clients. Old servers ignore the new element
because they don't understand it, and the client code simply recognizes the case
by noticing when it has received more than the requested number of log entries
and ignoring the rest, allowing the new parameter to work even with a server
that does not understand it.
Custom protocol design
Of course, you don't need to use XML in order to ensure forward and backward
compatibility in your protocols. Subversion's libsvn_ra_svn and
svnserve have a custom protocol that uses many of the same tricks
you might use in an XML-based protocol. The svn:// protocol sends
data across the network encoded in tuples: lists of items that are known to
contain certain items. The functions for reading tuples off the wire ignore
extra entries in the tuple, so you can add new parameters and old servers will
ignore them, just like we did in the DAV-based format.
Additionally, the svn:// protocol includes in its initial
handshake a minimum and maximum protocol version and list of capabilities
supported by the server and client. Thus, both the client and server have a chance
to adjust to the exact version of the protocol being spoken at the other end of
the connection. This allowed the addition of pipelining to the protocol shortly
before the release of Subversion 1.0 while ensuing that old clients continued
to work. See the subversion/libsvn_ra_svn/protocol file for more
details on how the svn:// protocol works.
Upgrade paths
For forward-compatible but not backward-compatible changes, what's most important is to provide a smooth update path. There are two main ways of doing this, both of which Subversion has used at various times.
Ease in the change
Long before Subversion hit 1.0, the developers made the decision to change
the format used when storing timestamps in the working copy code. The change
occurred slowly, over the course of a few releases. First came support
for reading the new format, so the code that parses timestamps would try the
new format, and if that failed it would try the old format before finally
returning an error if that failed. Then, after that code had been out in the
wild for a while, libsvn_wc changed so that it wrote out
timestamps in the new format. The pre-1.0 policy for upgrades was to ensure
compatibility only within a single version. Because the support for reading the
new format went in a version before the introduction of support for writing the
new format, the project retained that compatibility. Support for the old date
format exists to this day in Subversion's timestamp-parsing code, but nothing
has written out dates in that format in quite some time.
What's important to keep in mind here is that the slow introduction of change allowed the users the ability to revert from the new version (which produced the new format timestamps) to the previous version (which knew how to read them) on the off chance that they encountered some sort of problem with their upgrade.
Detect the incompatibility and compensate
The addition of UUIDs to Subversion repositories is another example of how
to change an on-disk format in a backward-compatible way. Originally
Subversion repositories did not have any unique identifier; features like
svn switch --relocate were dangerous because you couldn't ensure
that both URLs referred to the same repository. To solve the problem, each
repository now has a universally unique ID stored in a new database table
(because at the time, the only filesystem back end that existed was the Berkeley
DB one). To ensure that new code worked with repositories created prior to the
addition of this feature, the lack of this table simply caused the function
that returns the repository's UUID to create the table itself, seamlessly
upgrading the repository without the user ever being aware of it.
The important item to note here is that if you can possibly make the upgrade automatic from the point of view of the user, then you should do so. Avoiding manual steps can be only a good thing.
Dependency problems
One place where it's easy to forget about compatibility problems is in your project's dependencies. Any libraries you link against or external programs you use will each to have their own compatibility issues, just as you will. It's important to be aware of those issues when deciding to make use of a third-party product. In Subversion we've had at least three separate dependencies that cause compatibility problems. Some are internal to Subversion, and some poke through to users from time to time.
Dependencies that show through your API
The most important kind of dependency you need to worry about is one that shows up in your public API. This can happen when you use data types defined in the library as arguments to your library's functions, such as with the Apache Portable Runtime (APR) in Subversion. Any non-backward-compatible change that occurs in the library you depend on will instantly affect you as soon as your users try to upgrade to a new version of the dependency.
When Subversion first hit 1.0, the only released version of APR was from the 0.9.x series of releases. Because Subversion uses APR in almost every part of its public interface, this means that to maintain ABI compatibility, all releases of Subversion within its 1.x branch only officially support the use of APR 0.9.x releases. While Subversion does happen to work with APR 1.0.x, official builds use 0.9.x.
The primary reason Subversion can't use APR 1.0.x is that the size of the
data type apr_off_t has changed from off_t (often 32
bits long on a 32-bit system) to long (often 64 bits long on a 32-bit system). This support was necessary for interoperating with programs (Perl,
for example) that redefine the size of an off_t via the
_FILE_OFFSET_BITS define. Because apr_off_t shows up
in the public Subversion API, this change makes versions of Subversion compiled
with APR 1.0.x instantly incompatible with versions compiled with APR 0.9.x.
Additionally, APR uses a set of compatibility rules that allow it to drop and
change parts of its public API between major versions, so any of those kinds of
changes will cause similar types of problems as the apr_off_t
changes.
The important lesson to learn from this is that as soon as you let into your program a data type defined by your dependency, its compatibility issues instantly become your compatibility issues.
Dependencies that are hidden by your API
An interesting counterexample in Subversion's case is the Neon library, which Subversion uses as its HTTP/WebDAV client library. Neon differs from APR in two ways. First, Neon doesn't make it into Subversion's public interface, so changes to Neon's data types have a harder time making themselves seen to clients of Subversion itself. Second, Neon's interface is far less stable than APR's is. Even APR 0.9.x, despite its pre-1.0 version number, provides a high level of stability in its API. Neon has never professed to do so, with nontrivial changes in its API being reasonably common.
This means that in order to support multiple versions of Neon, Subversion
needs to jump through a few hoops. That has happened at least once, with
nontrivial amounts of shim code being introduced to libsvn_ra_dav
in order to account for changes in the Neon API as a result. This allowed
Subversion to function with either the old Neon API or the new one for a
reasonable amount of time while users upgraded.
While backflips like the shim code in libsvn_ra_dav ease the
burden on its users, they don't solve all the problems. If a program uses Neon
directly in its own code as well as the Subversion API's, it's possible for Neon
upgrades required by Subversion to break backward compatibility. It's not
clear yet the best way to handle this kind of change.
Again, it is important to note that this is a valuable lesson. Once you use a library, its compatibility issues are your compatibility issues.
Dependencies that show through your on-disk formats
The last type of compatibility problem that a third party library can introduce is when the library is responsible for the on-disk format of some of your data, as in the case of Berkeley DB as used by Subversion. Upgrading to a new version of the library can result in unexpected problems if the disk formats are incompatible. This has resulted in significant issues, mainly because Berkeley DB upgrades often require manual intervention, ranging from a full dump/load cycle to a simple recovery. Vendors and distributors often package Berkeley DB so upgrades may occur without the user's conscious action.
There's not much more to say about this kind of compatibility problem other than the fact that the only real solution is education. Users need to understand the issues upgrades can bring, and ideally the problems that result from them need to specify what has gone wrong. Unfortunately, users often feel terror at the sudden inability to access their data, so panic may outweigh education in some cases.
Making Compatibility Decisions
Now that you've learned about the types of compatibility, seen some tricks you can use to help maintain them, and heard about some specific examples of how such problems can occur, it's time to think about your specific application and how these issues apply to you.
First, consider your user base. If you have only a dozen highly technical users, jumping through hoops to maintain backward compatibility may be more trouble than it's worth. On the other hand, if you have hundreds or thousands of nontechnical users who cannot deal with manual upgrade steps, you need to spend a lot of time worrying about those kinds of issues. Otherwise, the first time you break compatibility you'll easily burn through all the goodwill you built up with your users by providing them with a useful program. It's remarkable how easily people forget the good things from a program as soon as they encounter the first real problem.
Next, consider your project. If you don't actually provide a library your users embed in their own application, worrying about API and ABI stability is pointless. Similarly, if you don't store data on disk or send it over the network, the issues associated with those activities are moot. It's rare that a program has no compatibility issues at all, but it's also rare for one to encounter all the issues described in this article.
In Conclusion
Consider again the example of the Subversion project. Subversion's compatibility policy appears in the "Release numbering, compatibility, and deprecation" section of the HACKING file in the top level of its source tree. You can upgrade and downgrade within a single minor release cycle without issue. You can upgrade to new versions in the same major release cycle without issue. When the major version number changes, all bets are off. These rules apply to API/ABI issues, data format issues, and network protocol issues.
Has the project followed this policy? The answer--as is often the case with software engineering--is a qualified yes. Subversion has in one instance added a function in a nonminor release as part of a change to fix a security problem that broke the ability to go back and forth within that specific minor version. The nature of the security problem meant sacrificing compatibility in this particular case.
Other than that, though, the policy has been a success. Users have upgraded to new versions of Subversion without fear. Various versions of the official client and server, and even third-party clients that implement the same protocols, have also enjoyed continued compatibility. The users seem happy with the compatibility promises, and the developers are not overly hampered by them. It isn't always easy, but in my opinion it's been worth it.
All projects need to consider compatibility. The issues are rarely as simple as you might like, and they require serious thought for each project, as no two are the same. Finally, be aware that worrying too much about compatibility can cripple you, so it's important not to place too high a price on it. Only you can determine how high is too high. I hope this article has given you a starting place for making that determination.
Garrett Rooney is a software developer at FactSet Research Systems, where he works on real-time market data.
Return to ONLamp.com.
You must be logged in to the O'Reilly Network to post a talkback.




