Navigating SQL Hierarchies: Finding the Ultimate Parent

Untangling the web of parent-child relationships across multiple hierarchical levels can be challenging, yet it’s crucial for insightful data analysis. Frequently, we need to identify the apex of these hierarchies, the ‘ultimate parent’, in order to group data for analysis. However, the unpredictable number of levels within these hierarchies can complicate this task. In this post, I will show two queries that can help you accurately pinpoint the ultimate parent, regardless of the complexity of the hierarchical relationships.

Parent-Child Table Example

To get us started, let’s look at a simple example. Here’s a table showing parent-child relationships in a typical SQL hierarchy with just two columns:

child_idparent_id
1NULL
21
31
42
52
63
73
84

Advanced Approach: Using Recursive Queries to Trace Hierarchies

If available, a recursive query is an efficient way to get to the ultimate parent. A recursive query will keep querying data until we meet a certain condition. In our case, that condition is reaching the ultimate parent.

WITH RECURSIVE ultimate_parent AS ( -- find the ultimate parent
  SELECT child_id, parent_id, child_id AS Ultimateparent_id
  FROM account
  WHERE parent_id IS NULL  -- Assuming the top-level parent has NULL parent_id
  
  UNION ALL
  
  SELECT s.child_id, s.parent_id, r.Ultimateparent_id
  FROM account AS s
  JOIN ultimate_parent AS r ON s.parent_id = r.child_id
)

SELECT
    child_id,
    parent_id,
    Ultimateparent_id
FROM ultimate_parent

Alternative Approach: Navigating Hierarchies with Self-Joins

Using a recursive query isn’t an option, or you might be looking for a more straightforward, less advanced method. This alternative approach involves repeatedly joining the same table on itself – a technique known as self-join. Let’s take a look at this less complex, but equally effective method:

SELECT
  t1.child_id,
  t1.parent_id,
  COALESCE(t10.parent_id,t9.parent_id,t8.parent_id,t7.parent_id,t6.parent_id,t5.parent_id, t4.parent_id, t3.parent_id, t2.parent_id, t1.parent_id, t1.child_id) AS ultimate_parent_id
FROM
  account t1
LEFT JOIN
  account t2 ON t1.parent_id = t2.child_id
LEFT JOIN
  account t3 ON t2.parent_id = t3.child_id
LEFT JOIN
  account t4 ON t3.parent_id = t4.child_id
LEFT JOIN
  account t5 ON t4.parent_id = t5.child_id
LEFT JOIN
  account t6 ON t5.parent_id = t6.child_id
LEFT JOIN
  account t7 ON t6.parent_id = t7.child_id
LEFT JOIN
  account t8 ON t7.parent_id = t8.child_id
LEFT JOIN
  account t9 ON t8.parent_id = t9.child_id
LEFT JOIN
  account t10 ON t9.parent_id = t10.child_id

In this query, we start with our main table, account, aliased as t1. We then use a series of LEFT JOIN operations to join account back onto itself multiple times, each time going up one level in the parent-child hierarchy. We do this up to 10 times (t10) to cater to hierarchies up to 10 levels deep.

The COALESCE function then steps through the parent_ids from the highest level (t10) down to the first level (t1). The function returns the first non-null parent_id it encounters, which is the child_id of the ultimate parent.

The beauty of this query is that it can work in many SQL systems that might not support recursive queries. However, it does have a limitation: it can only traverse as many levels as the number of joins you include in your query. In our case, that’s 10 levels. If you have a deeper hierarchy, you’d need to add more joins.

This might not be as elegant or flexible as the recursive method, but it’s a solid approach that gets the job done when you need a less advanced solution.

Final Thoughts

Check out more Python tricks in this Colab Notebook or in my recent Python Posts.

Thanks for reading!


Posted

in

,

by

Tags: