SQL has many language elements that are not commonly introduced in quick surveys of the language. Because SQL is based on relational algebra, it should come as no surprise that the standard contains several set-oriented operators.
In the interest of completeness, and to provide a tool that is extremely useful in dealing with heterogeneous data, it's time we take a look at the UNION operator.
If you were to dredge up the memories of middle school, you'll probably remember vague notions of learning about sets. If you have ever studied logic, you may have even worked with Venn diagrams.
Sets are simply groups of discrete elements. In the strict mathematical world, each set is made up of unique numerical elements; in the SQL world, a set is a collection of rows of data that may or may not contain duplicates. You've actually been working with SQL sets for quite a while -- queries are simply a set (strictly speaking, a subset) of a database table.
There are three simple ways to compare sets:
While all of these functions are defined in the SQL standard (at least in the SQL-92 standard), the exact implementation varies widely by vendor. Many database engines leave out some or all of these functions because they are not used as frequently as other commands. Or vendors may omit them because the function of the operators can be recreated using SQL and/or the database engine's programming language. In rough order of implementation, most databases allow the UNION operator, some provide the INTERSECT operator, and a few provide the functionality of the EXCEPT operator using a different operator name (for example, Oracle's MINUS operator).
|
Also in aboutSQL: |
I'm sure that some of you may be wondering why I dedicated a column to this dusty corner of the SQL world. Fear not! There's a very good reason to learn about SQL sets, particularly the UNION operator.
Working with sets provides the flexibility to manipulate two tables that otherwise have no relationship -- where joins are not possible or useful. Let's start with an simple example. You have a database with a table of employees, and a table of suppliers.
Employees |
|||
| EmployeeID | EmployeeName | Address | ZIP |
| 61 | Sue Smith | 100 South St. | 12345 |
| 62 | David Jones | 2525 1st St. | 12345 |
Suppliers |
|||
| SupplierID | Contact | Address | ZIP |
| A1-674 | Troy Parker | 1100 Main St. | 23456 |
| ZZ-1A1 | Claire Smith-Jones | 400 East Main, Apt 5 | 56789 |
If you were sending Christmas cards to all suppliers and employees, you could certainly use a single table of name, address, and ZIP code data that could be the data source for a mail merge. Of course, you could create a new table, execute two SQL statements to insert the appropriate columns, and call it a day.
|
![]() SQL in a Nutshell |
|||||
But what if this is an activity that you need to repeat frequently? Or what if the dataset is large enough that it is impractical to create a new database table? Or what if you simply don't have the permissions to do anything other than SELECT queries in your database? A UNION query is the answer!
The syntax for a UNION query is very straightforward:
query 1 UNION [ALL] query2 [ORDER BY sort_order]
The optional ALL keyword indicates the UNION should include all duplicates which would be ignored by default. So to create our table of addresses, we can use the following SQL statement
SELECT Contact AS Name, Address, ZIP FROM Suppliers
UNION
SELECT EmployeeName AS Name, Address, ZIP FROM Employees
which would produce a table something like the following:
| Name | Address | ZIP |
| Sue Smith | 100 South St. | 12345 |
| David Jones | 2525 1st St. | 12345 |
| Troy Parker | 1100 Main St. | 23456 |
| Claire Smith-Jones | 400 East Main, Apt 5 | 56789 |
This doesn't seem like such a big deal. But what if the data tables were somewhat more heterogeneous? One interesting thing about the UNION operator is that in general, any two queries can be joined as long as they have the same number of columns. Note that the data types don't have to be the same. That's pretty useful!
In the past, we've discussed using one-to-one tables to store specific data such as information about a music CD or a book that is linked to the table of common data about items in a store. If you instead chose to create a table of CDs and a table of books, you could create a SQL statement like the following that will generate a single result set consisting heterogeneous data:
SELECT CatalogID, Price, Description, PlayingTime AS Custom1, NumOfTracks AS Custom2, Artist AS Custom3, Label AS Custom4 FROM CDs
UNION
SELECT CatalogID, Price, Description, PageCount AS Custom1, Author AS Custom2, PublishDate AS Custom3, ISBN AS Custom4 FROM Books
In this scenario, not only do the columns in the new set contain values that mean different things, they can even contain different types of data (playing time is probably a time or text field, the page count is probably an integer).
Why don't more people use UNION queries? And what about INTERSECT and EXCEPT? The truth is that these operations can be computationally intensive and there are often alternatives -- such as pulling out two data sets and using a C++ program to create the union, intersection, or difference set. But if your database supports them, they can occasionally be a lifesaver.
Starting in the next column, we're going to switch from discussing SQL for queries to discussing SQL statements for constructing tables and databases. That topic will occupy us for several columns and lead directly in to issues such as triggers, constraints, and all the myriad other tools that can save you time and effort. Until then, feel free to contact me with comments and questions.
John Paul Ashenfelter is president and CTO of TransitionPoint.com, a technology development and analysis group focusing on web database applications.
Read more aboutSQL columns.
Return to ONLamp.com.
John Paul Ashenfelter is president and CTO of TransitionPoint.com, a technology development and analysis group focusing on web database applications.
Copyright © 2009 O'Reilly Media, Inc.