Ordering tables by foreign key dependency in SQL

As a SQL developer I love constraints, and fewer constraints come in handier than foreign keys. Mappings between two tables become defined to both human and server: other developers understand how your tables come together, and the server prevents nonsense from being written to tables.

This being said, foreign keys become an administrative hassle when replicating data between other databases or from backup. Any table that references any other table will have to be loaded after its precedent to avoid foreign key violations, and inexplicably few database systems allow for a means to order tables by their dependencies. This doesn’t mean to say that doing so is impossible though.

Suppose we had the following database design:

sqlblog

 

In Microsoft SQL Server there is helpfully the foreign_key_columns system view which will list all foreign keys, their parent objects/columns and their referenced object/columns. Used in conjunction with other system views like tables, this becomes an effective means of determining which tables to load first.

The following method populates a temporary table, #AllTables, with every table on the current database and assigns an appropriate dependency_number value:

 /* 
 
 *  Temporary table used such that
 
 *    the compiled list can be
 
 *    freely interacted with later
 
 */
 

CREATE TABLE #AllTables(
    [object_id] INT NOT NULL
     ,[name] NVARCHAR(128) NOT NULL
     ,[schema_name] NVARCHAR(128) NOT NULL
     ,[dependency_number] SMALLINT NULL
);
 
-- Step #1
INSERT INTO #AllTables([object_id],[name],[schema_name])
SELECT t.[object_id],t.[name],s.[name]
FROM [sys].[tables] t
INNER JOIN [sys].[schemas] s ON t.[schema_id] = s.[schema_id]
ORDER BY t.[object_id];
 
-- Step #2
-- Dependency #0 tables
 
UPDATE #AllTables
SET [dependency_number] = 0
WHERE [object_id] NOT IN(
 SELECT [parent_object_id]
 FROM [sys].[foreign_key_columns]
 );
 
-- Step #3
-- Dependency #n tables
 
DECLARE @n SMALLINT;
SELECT @n = 1;
WHILE EXISTS(
  SELECT * FROM #AllTables
  WHERE [dependency_number] IS NULL
) AND(@n < 50)          
-- Safety first
 
BEGIN
 
  UPDATE o
   SET o.[dependency_number] = so.[dependency_number]
    FROM #AllTables o
    INNER JOIN(
      SELECT po.[object_id],MAX(ro.[dependency_number]) + 1
      FROM [sys].[foreign_key_columns] fkc
     INNER JOIN #AllTables po
       ON fkc.[parent_object_id] = po.[object_id]
     INNER JOIN #AllTables ro
       ON fkc.[referenced_object_id] = ro.[object_id]
       WHERE fkc.[parent_column_id] != fkc.[referenced_column_id]
    AND po.[dependency_number] IS NULL
    AND NOT EXISTS(
       SELECT *
       FROM [sys].[foreign_key_columns] sfkc
       INNER JOIN #AllTables sro
         ON sfkc.[referenced_object_id] = sro.[object_id]
       WHERE sfkc.[parent_object_id] = po.[object_id]
       AND sro.[dependency_number] IS NULL
    
        GROUP BY po.[object_id]
       ) so([object_id],[dependency_number])
        ON o.[object_id] = so.[object_id];
 
   SET @n = @n + 1;
 
END;

First of all we would need the tables. This list can be acquired easily enough from the tables system view, after which we could use our own table henceforth. Of these tables we can immediately mark those which do not exist in foreign_key_columns as a parent table. These will have a dependency number of #0. Going back to our example database, the tables Person and Product will have dependency #0.

Then we enter the loop. Of the tables that remain unmarked, we intend to find all the tables which refer directly to our dependency #0 tables to begin with, then our dependency #1 tables and so forth. The dependency number of a table should be the highest there is, plus one. The NOT EXISTS clause is there to guarantee that only tables which do not depend on tables with unknown dependency numbers. Without this, OrderProduct would also be given a dependency of #1, which would be incorrect: it refers to CustomerOrder.

After running on our example database we get:


 name             dependency_number
---------------- -----------------
Person           0
Product          0
Price            1
CustomerOrder    1
OrderProduct     2

From now on if ever you need to load data into such a database, all that you would need to do is load Person and Product data into the database first, then Price andCustomerOrder data, and so on. Deleting data/truncating tables? Easy – just work in reverse!

The loop assumes that no circular dependencies exist (e.g. two tables referencing each other). If they do, the loop will break out after the 49th iteration anyway. Alas, you will have no alternative to handling such dependencies other than to disable the foreign keys with:


ALTER TABLE <table_name> NOCHECK CONSTRAINT <foreign_key_name>

Remember to restore your foreign keys afterwards though!

Leave a Reply

Your email address will not be published. Required fields are marked *

*