Brief Introduction to Processing Hierarchical Data in Oracle


Introduction

The purpose of this document is to provide a brief introduction on how to process hierarchical data in Oracle using the CONNECT BY syntax. If the reader desires additional detail, he can consult the references at the end of this document.

We will use the employee / manager relationship for the hierarchy.

When using Oracle's CONNECT BY syntax, there are two independent settings:

  1. Selecting whether to start with a manager or employee.

  2. Selecting whether to ascend (PRIOR manager) or descend (PRIOR employee).

This leads to 4 possible combinations described in the table below.

Table 1. Four Possible Combinations Associated with CONNECT BY

(PRIOR employee) [Descend] (PRIOR manager) [Ascend]
START WITH employee Scenario 1 Scenario 2
START WITH manager Scenario 3 Scenario 4

Once we have examined the 4 scenarios in detail, we will then examine the concepts of "bridge tables" and "hierarchical order".

Table and Data To Experiment With

To demonstrate the concepts, a table is created and populated with data. The needed create table command is listed below. The hierarchical data can be viewed visually in either tree form [Excel] or recursive parent pointer form [Excel]. Please view the data at this piont in time. The remainder of the document assumes that you have.


CREATE TABLE temp_rl_recursive_hierarchy (
	employee_id		VARCHAR2(5 BYTE)	NOT NULL,
	manager_id		VARCHAR2(5 BYTE)	    NULL);


-- An employee should be listed only once
CREATE UNIQUE INDEX uidx_temp_rl_r_hierarchy_emp
ON  temp_rl_recursive_hierarchy (employee_id);


-- A manager can be listed multiple times.
-- Want an index so that have better performance 
--	when scan the hierarchy.
CREATE INDEX idx_temp_rl_r_hierarchy_mgr
ON  temp_rl_recursive_hierarchy (manager_id);

The test data is generated using the following criteria:

  1. There are no cycles (also known as a loops).

  2. The top parent is identified by having his manager have a NULL value.

  3. No pattern exists in the data that can be used to traverse the hierarchy. For example, a balanced tree is not used.

    1. There is at least one node which has no children at each level.

    2. Each level has at least one node which has two children.

    3. Each level has at least one node which has three children.

Scenario 1: START WITH employee - (PRIOR employee) [Descend]

The needed select statement is listed below. The output of the select statement for employee identifiers 00001, 00002, and 00017 has been generated.


SELECT
	employee_id,
	manager_id,
	LEVEL
FROM
	temp_rl_recursive_hierarchy
CONNECT BY
	( PRIOR employee_id ) = manager_id
START WITH employee_id = '<employee identifier>';

Scenario 2: START WITH employee - (PRIOR manager) [Ascend]

The needed select statement is listed below. The output of the select statement for employee identifiers 00001, 00002, and 00022 has been generated.


SELECT
	employee_id,
	manager_id,
	LEVEL
FROM
	temp_rl_recursive_hierarchy
CONNECT BY
	employee_id = (PRIOR manager_id)
START WITH employee_id = '<employee identifier>';

Scenario 3: START WITH manager - (PRIOR employee) [Descend]

The needed select statement is listed below. The output of the select statement for employee identifiers 00001, 00002, and 00017 has been generated.


SELECT
	manager_id,
	employee_id,
	LEVEL
FROM
	temp_rl_recursive_hierarchy
CONNECT BY
	( PRIOR employee_id ) = manager_id
START WITH  manager_id = '<employee identifier>';

Scenario 4: START WITH manager - (PRIOR manager) [Ascend]

The needed select statement is listed below. The output of the select statement is "ERROR: ORA-01436: CONNECT BY loop in user data" for non-leaf nodes. For leaf nodes, no rows are returned. For a definition of leaf nodes, refer to reference 2. This is logical because if want to ascend the hierarchy, should start with an employee and not a manager.


SELECT
	manager_id,
	employee_id,
	LEVEL
FROM
	temp_rl_recursive_hierarchy
CONNECT BY
	manager_id = (PRIOR manager_id)
START WITH manager_id = '<employee identifier>';

Bridge Table: Alternative To Adjacency List

The table created earlier (temp_rl_recursive_hierarchy) is an adjacency list. In other words, the edges of the graph are represented as pairs of nodes. For more information on adjacency lists, refer to reference 3.

It should be noted that the most common way hierarchies are represented in data warehouses is using a "bridge table" and not adjacency lists. For further details on the concept of a bridge table, see reference 4. The bridge table solution trades away storage space for greater speed by creating a record for every path enumeration.

Hierarchical Order

The SELECT statement below retrieves data in Oracle's hierarchical order [Excel]. Note that the SELECT statements is for traversing down the hierarchy.


SELECT
	employee_id,
	manager_id,
	LEVEL
FROM
	temp_rl_recursive_hierarchy
CONNECT BY
	( PRIOR employee_id ) = manager_id;

The SELECT statement below retrieves data in Oracle's hierarchical order and then applies an ORDER BY [Excel].


SELECT
	employee_id,
	manager_id,
	LEVEL levels_from_root_node
FROM
	temp_rl_recursive_hierarchy
CONNECT BY
	( PRIOR employee_id ) = manager_id
ORDER BY
    employee_id ASC,
    manager_id ASC, 
    levels_from_root_node ASC;

Note that CONNECT BY is incompatible with ORDER BY unless the keyword SIBLINGS is used.

References

  1. Oracle Database - SQL Reference - 10g Release 2 (10.2)

  2. Mastering Oracle SQL, 2nd Edition - By Sanjay Mishra, Alan Beaulieu - ISBN: 0-596-00632-2 - Chapter 8: Hierarchical Queries [Amazon]

  3. Trees and Hierarchies in SQL for Smarties - By Joe Celko - ISBN: 1-558-60920-2 [Amazon]

  4. The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling (Second Edition) - By Ralph Kimball, Margy Ross - ISBN: 0-471-20024-7 - Chapter 6: Customer Relantionship Management [Amazon]