O'Reilly Databases

oreilly.comSafari Books Online.Conferences.

We've expanded our coverage and improved our search! Search for all things Database across O'Reilly!

Search Search Tips

advertisement
AddThis Social Bookmark Button

Listen Print Discuss Subscribe to Databases Subscribe to Newsletters

Hierarchical SQL
Pages: 1, 2

Searching for Superiors

Given a node, find all of its superiors. This requires disassembling the path back into the identifiers that constructed it. We can use a table of sequential integers to find the required substrings:

SELECT SUBSTRING (P1.path_string 
            FROM (seq * CHAR_LENGTH(P1.emp_id)) 
             FOR CHAR_LENGTH(P1.emp_id)) AS emp_id
   FROM Personnel_OrgChart AS P1,
        Sequence AS S1  
  WHERE P1.emp_id = :search_emp_id
    AND S1.seq <= CHAR_LENGTH(path_string)/CHAR_LENGTH(emp_id);

The problem is that this does not tell you the relationships among the superiors, only who they are. Those relationships are actually easier to report.

SELECT P2.*
   FROM Personnel_OrgChart AS P1,
        Personnel_OrgChart AS P2
  WHERE P1.emp_id = :search_emp_id
    AND POSITION (P2.path_string IN P1.path_string) = 1;

Deleting a Subtree

Given a node, delete the subtree rooted at that node. This can use the same predicate as for finding the subordinates.

DELETE FROM Personnel_OrgChart
 WHERE path_string LIKE (SELECT path_string FROM Personnel_OrgChart
                   WHERE emp_id = :dead_guy) || '%';

Deleting a Single Node

Once more we have to face the problem that removing a non-leaf node from a tree means that it is no longer a tree. We need to have rules for changing the structure.

Assuming that we simply move everyone up a level in the tree, we can first remove that node emp_id from the Personnel_OrgChart table and then remove that emp_id from the paths of the other nodes.

BEGIN ATOMIC
DELETE FROM Personnel_OrgChart
 WHERE emp_id = :dead_guy;
UPDATE Personnel_OrgChart
   SET path_string = REPLACE (path_string, :dead_guy, '');
END;

There are other methods of rebuilding the tree structure after deleting a node: promoting a subordinate based on some criteria to the newly vacant position or leaving a vacancy (a dummy employee named {{vacant}}) in the organizational chart are options. These options usually require some combination of node deletions and insertions.

Inserting a New Node

When inserting a new node, simply concatenate the new emp_id to the end of the path of its parent node.

INSERT INTO Personnel_OrgChart
VALUES (:new_guy, :new_emp_id,
        (SELECT path_string 
           FROM Personnel_OrgChart 
          WHERE emp_id = :new_guy_boss)
         || :new_emp_id);

This basic statement can work for insertion of a subtree with some modification:

INSERT INTO Personnel_OrgChart
SELECT N1.emp, N1.emp_id,
        (SELECT path_string 
           FROM Personnel_OrgChart 
          WHERE emp_id = :new_tree_boss)
         || N1.emp_id
  FROM NewTree AS N1;

Splitting Up a Path String

The path string contains information about the nodes in the path it represents, so you will often want to split it back into the nodes it represents. This is easier to do if the path string uses a separator character, such as a comma or slash. I will use a slash so this will look like a directory path in UNIX. You will also need a table called Sequence that contains a set of integers from 1 to (n).

CharIndex(<search string>, <target string>, <starting position>) is a vendor version of the Standard SQL function POSITION(<search string> IN <target string>). It begins the search at a position in the target string, thus when the <starting position< = 1, the two are equivalent. One definition is:

CREATE FUNCTION CharIndex (IN search_str VARCHAR(1000), 
    IN target VARCHAR(1000), IN start_point INTEGER) RETURNS INTEGER 
RETURN
   (POSITION (search_str 
              IN SUBSTRING (target FROM start_point)) + start_point -1);

Version 1

SELECT CASE WHEN SUBSTRING('/' || P1.path_string || 
                                      '/' FROM S1.seq FOR 1) = '/'
             THEN SUBSTRING('/' || P1.path_string || '/' FROM (S1.seq +1) 
                FOR CharIndex('/', '/' || P1.path_string || '/', S1.seq +1) 
                   - S1.seq - 1)
             ELSE NULL END AS emp_id
   FROM Sequence AS S1, Personnel_OrgChart AS P1
  WHERE S1.seq BETWEEN 1 AND CHAR_LENGTH('/' || P1.path_string || '/') - 1
    AND SUBSTRING('/' || P1.path_string || '/' FROM S1.seq FOR 1) = '/'

Version 2

This uses the same idea, but with two sequence numbers to bracket the emp_id embedded in the path string. It also returns the position of the subordinate emp_id in the path.

CREATE VIEW Breakdown (emp_id, step_nbr, subordinate_emp_id)
AS
SELECT emp_id, 
       COUNT(S2.seq),
       SUBSTRING ('/' || P1.path_string || '/', MAX(S1.seq || 1)
                   FROM (S2.seq - MAX(S1.seq || 1))
  FROM Personnel_OrgChart AS P1, Sequence AS S1, Sequence AS S2
 WHERE SUBSTRING ('/' || P1.path_string || '/', S1.seq, 1) = '/'
   AND SUBSTRING ('/' || P1.path_string || '/', S2.seq, 1) = '/'
   AND S1.seq < S2.seq
   AND S2.seq <= CHAR_LENGTH(P1.path_string) +1
 GROUP BY P1.emp_id, P1.path_string, S2.seq;

The S1 and S2 copies of Sequence help to locate bracketing pairs of separators. The entire set of substrings located between them is extracted in one step. The trick is to be sure that the left-hand separator of the bracketing pair is the closest one to the second separator. The step_nbr column tells you the relative position of the subordinate employee to the employee in the path.

Version 3

This version is the same as above, but more concise and easy to comprehend.

SELECT SUBSTRING('/' || P1.path_string || '/' 
                  FROM S1.seq +1
                   FOR CharIndex('/', 
                                 '/' || P1.path_string || '/', 
                                 S1.seq +1)- S1.seq - 1) AS node
   FROM Sequence AS S1, Personnel_OrgChart AS P1
  WHERE SUBSTRING('/' || P1.path_string || '/'
                   FROM S1.seq FOR 1) = '/'
    AND seq < CHAR_LENGTH('/' || P1.path_string || '/');

Version 4

Here's another way, using the LIKE predicate:

SELECT SUBSTRING(P1.path_string 
                  FROM seq +1 
                   FOR CharIndex('/', P1.path_string, S1.seq +1) - (S1.seq +1))
   FROM Sequence AS S1
        INNER JOIN 
        (SELECT '/' || path_string || '/' 
           FROM Personnel_OrgChart) AS P1.(path_string)
        ON S1.seq <= CHAR_LENGTH(P1.path_string)
           AND SUBSTRING(P1.path_string 
                         FROM S1.seq 
                          FOR CHAR_LENGTH(P1.path_string)) 
               LIKE '/_%';

The Edge Enumeration Model

So far, we have seen the node enumeration version of the path enumeration model. In the edge enumeration model, the "driving directions" for following the path from the root to each node are integers. The path column contains a string of the edges that make up a path from the root (Albert) to each node, numbering them from left to right at each level in the tree.

Personnel_OrgChart
emp_name edge_path  
==================
'Albert'   '1.'
'Bert'     '1.1.'
'Chuck'    '1.2.' 
'Donna'    '1.2.1.' 
'Eddie'    '1.2.2.' 
'Fred'     '1.2.3.'

For example, Donna is the second child of the first child (Chuck) of the root (Albert). This assigns a partial ordering to the nodes of the trees. The main advantage of this notation is that you do not have to worry about long strings, but there is no real difference in the manipulations. The numbering does give an implied ordering to siblings that might have meaning.

Summary

Path enumeration models have problems with deeper trees and with trees that do not have a natural way of naming the nodes or edges. Maintaining proper constraints can be difficult in actual SQL products because of the lack of support for Full SQL-92 features.

On the plus side, path enumeration is a very fast and natural technique for trees without great depth and frequent changes to the structure. If you perform most of your searches and aggregations from the root down, it can handle surprisingly wide tree structures.

Joe Celko was a member of the ANSI X3H2 Database Standards Committee from 1987 to 1997 and helped write the ANSI/ISO SQL-89 and SQL-92 standards.


Return to ONLamp.com.


Have a question about this technique or strategies to port this to another database? Let us know here.
You must be logged in to the O'Reilly Network to post a talkback.
Post Comment
Full Threads Oldest First

Showing messages 1 through 8 of 8.

  • Great but Some More Commentary Would Be Helpful
    2009-01-19 06:43:01  anywaste [Reply | View]

    This article is very interesting and suggests what looks like an excellent solution to this problem, however it would be a bit easier to follow if there were a bit more explanation of exactly what each query is doing -- both because some of us may have differing levels of familiarity with the syntax required, and simply because the commands used are complex.
  • path strings
    2004-11-13 06:37:26  r937 [Reply | View]

    disclaimer: i have not yet implemented any of the storage methods other than the adjacency model (primarily because i could never understand them well enough to be able to write a query without having several reference books available -- they're just too hard)


    maybe it's me, but the "path-string" column sure looks like the dreaded bogeyman to-be-avoided-at-all-cost multiple-values-in-a-single-column non-first-normal-form terrible design that all sql gurus, including some guy named celko, have been warning us for years never to implement...


    what is the essential difference between 'ABD' and 'A/B/D' and 'A,B,D'?


    yet we jump all over php programmers who want to store values like 'A,B,D' in a column


    why aren't the path strings normalized into multiple rows like they should be?


    or would that be too inconvenient?


    ;o)



  • Hierarchical SQL
    2004-10-26 14:45:13  dev@ [Reply | View]

    http://www.codeproject.com/cs/database/Trees_in_SQL_databases.asp
  • How about using integer type for path?
    2004-08-19 01:44:10  Carfield Yim [Reply | View]

    I think we can integer type for path, let say we allow 100 organizations in same level. We can multiply 100 for each level of organization like:

    Albert
    / \
    / \
    Bert Chuck
    / | \
    / | \
    / | \
    Donna Eddie Fred

    Personnel_OrgChart
    emp_name emp_id path
    ===============================
    'Albert' 'A' 10000
    'Bert' 'B' 100
    'Chuck' 'C' 200
    'Donna' 'D' 201
    'Eddie' 'E' 202
    'Fred' 'F' 203

    Then we can select all item under under 'C' as:
    select * from Personnel_OrgChart where path > 200 and path < 300

    And all the other operation can do in interger calculation. How do you think about this? As it don't involve String operation, will it run much faster?
    • How about using integer type for path?
      2004-08-19 05:21:31  --CELKO-- [Reply | View]

      I cover versions of that method in the book, too. It works fairly well for static situations in the real world. That is, if the tree does not get too deep or too wide.

      You can also use a bigger gap than one in a nested sets model to speed up insertion by saving reorganization.

  • tangentially related....
    2004-08-09 04:09:42  bazzargh [Reply | View]

    I came across an interesting paper the other day describing indexing techniques to accelerate XPaths on relational DBs...

    http://www.inf.uni-konstanz.de/~grust/files/xpath-accel.pdf

    ...they assign a 2d coordinate to a node from its order in a depth-first traversal of the tree (the node count on the way in, and the node count on the way out). In these coordinates, the usual XPath partition of the tree into preceding, following, ancestor, descendant axes becomes the 4 quadrants around a node on the graph. This is obviously ideal for db's with spatial indexes.

    In their usage, there were no updates (the obvious problem with this algorithm) so its not a panacea for update, aggregation performance. I guess that's true for all tree structures on RDBMS, you have to choose the one that suits your usage scenarios. A nice idea nonetheless.
  • You might want to run the numbers ...
    2004-08-07 10:14:24  --CELKO-- [Reply | View]

    In most hierarchical problems, more time is spent doing queries and not insert/delete operations. If you need to do random insertions, then use the nested interval model; it has the advantage of adjacency list insertion speeds and nested set subtree searches, but trades this for more complex procedures.

    The adjacency list model is as fast as possible on insertion; a new node is a simple INSERT INTO statement. Deleting a node is the real problem because you have to connect the subordinates back into the tree! Also, this model requires procedural code to get data out of it for hierarchical aggregations, and to do searches for subtrees.

    The nested sets model is meant for queries, but it updates much faster than people first think. The tree structure is in a table with three columns (node_id, lft, rgt), so a large tree fits into a few physical disk pages, or fits completely into a covering index.

    If you have an ordered (what Sybase/SQL Server called "clustered") index, updates to the structure are done at table scan speeds.

    I have tested nested sets on ~250,000 nodes at depths of 10 and 15 levels and the numbers were quite good. The test was to sum a numeric value of all subordinates at each node, with a grand total at the root. The second test was to randomly delete and insert 20,000 nodes, then re-organize the tree.

    Depth does not matter in a Nested Set model-- it depends on the number of nodes (rgt-lft+1) in each sub-tree. Over an order of magnitude faster on subtree and aggregation queries and a bit over twice as long on insertion, once the disk pages were in cache.

    In all hierarchy models, the constraints require to maintain a hierarchy are a bit complicated and are based on graph theory. The current crop of programmers does not seem to know about graph theory or constraints, so they do not bother to write any, and then the data integrity gets trashed.
  • Same old non-solution
    2004-08-06 19:07:45  poodlebum [Reply | View]

    Your summary should be reworded and placed at the beginning of the article:

    "Neither the adjaceny list model, path enumeration, nor nested sets scale if your heirarchy is deep and/or wide, or if you must insert/update frequently. Nevertheless, presented here is the same old non-solution to your real world problems."

    I'd be interested in any future article that describes how to handle heirarchies in interesting, real world data.


Tagged Articles

Post to del.icio.us

This article has been tagged:

sql

Articles that share the tag sql:

How to Misuse SQL's FROM Clause (74 tags)

Configuring Database Access in Eclipse 3.0 with SQLExplorer (28 tags)

Managing Many-to-Many Relationships with PL/pgSQL (18 tags)

Hierarchical SQL (16 tags)

The Outer Limits of SQL JOINs (14 tags)

View All

hierarchical

Articles that share the tag hierarchical:

Hierarchical SQL (4 tags)

View All

database

Articles that share the tag database:

MySQL FULLTEXT Searching (54 tags)

Live Backups of MySQL Using Replication (53 tags)

Advanced MySQL Replication Techniques (53 tags)

Dreaming of an Atom Store: A Database for the Web (49 tags)

How to Misuse SQL's FROM Clause (38 tags)

View All

articles

Articles that share the tag articles:

What Is Web 2.0 (187 tags)

Ajax on Rails (21 tags)

Understanding MVC in PHP (13 tags)

Art and Computer Programming (10 tags)

Using Java Classes in Perl (9 tags)

View All

onlamp

Articles that share the tag onlamp:

Live Backups of MySQL Using Replication (4 tags)

Rolling with Ruby on Rails, Part 2 (3 tags)

Building a FreeBSD Build System (3 tags)

Design by Wiki (2 tags)

Enhanced Interactive Python with IPython (2 tags)

View All

Sponsored Resources

  • Inside Lightroom

Related to this Article

Understanding Oracle Clinical Understanding Oracle Clinical
by Joan M. Johnson
May 2007
$9.99 USD

Inside SQLite Inside SQLite
by Sibsankar Haldar
April 2007
$9.99 USD

Advertisement
O'Reilly Media

©2009, O'Reilly Media, Inc.
(707) 827-7000 / (800) 998-9938
All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners.
About O'Reilly
Academic Solutions
Authors
Contacts
Customer Service
Jobs
Newsletters
O'Reilly Labs
Press Room
Privacy Policy
RSS Feeds
Terms of Service
User Groups
Writing for O'Reilly
Content Archive
Business Technology
Computer Technology
Google
Microsoft
Mobile
Network
Operating System
Digital Photography
Programming
Software
Web
Web Design
More O'Reilly Sites
O'Reilly Radar
Ignite
Tools of Change for Publishing
Digital Media
Inside iPhone
O'Reilly FYI
makezine.com
craftzine.com
hackszine.com
perl.com
xml.com

Partner Sites
InsideRIA
java.net
O'Reilly Insights on Forbes.com