1

I have a table which contains folders paths. This table contains four columns: DirID, BaseDirID, DirLevel and DisplayPath. DirID - The folder's ID. BaseDirID - The ID of the first folder in the hierarchy. So all the folders (paths) from the same hierarchy share the same value in this column. DirLevel - The depth of the folder. DisplayPath - The folder's path.

I need to find all the "gaps" between those folders in the hierarchy.

Sample data for example:

DirID BaseDirID DirLevel DisplayPath
1   1  1  'A'
2   1  3  'A\B\C'
3   1  5  'A\B\C\D\E'
4   1  3  'A\B\F'
5   1  5  'A\B\F\G\H'
6   2  1  'U'
7   2  3  'U\V\W'
8   2  5  'U\V\W\X\Y'
9   2  3  'U\V\M'
10  2  5  'U\V\M\L\O'

So we need to find the following data:

BaseDirID DisplayPath
1   'A\B'
1   'A\B\C\D'
1   'A\B\F\G'
2   'U\V'
2   'U\V\W\X'
2   'U\V\M\L'

Comments:

a. This table contains more than 250,000 records of folders, so we seek for the most efficient way to do so, otherwise the script will be stuck for long time, time we don't have.

b. I don't have list of all folders. What I have are the "root" folders and the "leafs" folders which I need to find the "gaps" between them in the hierarchy.

c. The table can contains more than one hierarchy and we need to find the "gaps" in all of the hierarchies.

d. Each of the hierarchies can split, as you can see in the sample data the first hierarchy splits to two folders paths from the 'A\B' folder: 'A\B\C' and 'A\B\F'. And the second hierarchy splits to two folders paths from the 'U\V' folder: 'U\V\W' and 'U\V\M'. And we need to find all the "gaps" even in such cases when the hierarchies split.

e. The sql server version is: SQL 2012 SP3.

This question is a continuation question for the question which asked in the following link: SQL: Find missing hierarchy Folders (Paths) in a table Our question include also the 4th comment which appear in bold.

I saw that there is a new column type which called "hierarchyid" (start from sql server 2008), which I thought that might help us - https://learn.microsoft.com/en-us/sql/t-sql/data-types/hierarchyid-data-type-method-reference What do you think?

Thanks in advance.

Community
  • 1
  • 1
Nuriel Zrubavely
  • 463
  • 3
  • 13

1 Answers1

1

Using this added path (11,2,'U\V\Z\L\O\Q\R\S\T') to show multiple missing folders in a path:

with cte as (
select BaseDirID, DisplayPath = left(DisplayPath,len(DisplayPath)-charindex('\',reverse(DisplayPath)))
from t
where DirLevel > 1
  and not exists (
  select 1 
  from t i
  where t.BaseDirId = i.BaseDirId
    and i.DisplayPath = left(t.DisplayPath,len(t.DisplayPath)-charindex('\',reverse(t.DisplayPath)))
    )
union all 
select BaseDirID, DisplayPath = left(DisplayPath,len(DisplayPath)-charindex('\',reverse(DisplayPath)))
from cte t
where not exists (
  select 1 
  from t i
  where t.BaseDirId = i.BaseDirId
    and i.DisplayPath = left(t.DisplayPath,len(t.DisplayPath)-charindex('\',reverse(t.DisplayPath)))                                                   
    )
)
select distinct * 
from cte

rextester demo: http://rextester.com/CEVGZ96613

returns:

+-----------+-----------------+
| BaseDirID |   DisplayPath   |
+-----------+-----------------+
|         1 | A\B             |
|         1 | A\B\C\D         |
|         1 | A\B\F\G         |
|         2 | U\V             |
|         2 | U\V\M\L         |
|         2 | U\V\W\X         |
|         2 | U\V\Z           |
|         2 | U\V\Z\L         |
|         2 | U\V\Z\L\O       |
|         2 | U\V\Z\L\O\Q     |
|         2 | U\V\Z\L\O\Q\R   |
|         2 | U\V\Z\L\O\Q\R\S |
+-----------+-----------------+
SqlZim
  • 37,248
  • 6
  • 41
  • 59
  • I do need recursive missing folders. – Nuriel Zrubavely May 16 '17 at 20:31
  • Is the solution efficient? Because It using CharIndex() multipule times on each row in a table which contains 250,000 rows... – Nuriel Zrubavely May 17 '17 at 04:25
  • I tried this solution on table of size ~200k. The query ran more than 20 minutes until I canceled it! Can you please give me more efficient solution? I wrote it very clearly in the question: "This table contains more than 250,000 records of folders, so we seek for the most efficient way to do so". – Nuriel Zrubavely May 17 '17 at 08:38
  • @NurielZrubavely please include actual table schema ddl and associated indexes, constraints, and relationships in the question along with representative example data. – SqlZim May 17 '17 at 11:24
  • My table didn't have any index or primary key. I now added Non-Clustered index on the `BaseDirID` column, and now it runs in 45 seconds. But its still not efficient enough... Please bear in mind that we can make any changes we want to the table. I added now to the question that we have another column of `DirLevel` denoting the depth of the folder. – Nuriel Zrubavely May 17 '17 at 12:02
  • @NurielZrubavely 45 seconds instead of longer than 20 minutes is a pretty significant difference in performance. Again, please include actual table schema ddl and associated indexes, constraints, and relationships in the question along with representative example data. – SqlZim May 17 '17 at 12:23
  • I need you guys to recommend me about the table structure. The representative example data already exist in the question. – Nuriel Zrubavely May 17 '17 at 13:00
  • @NurielZrubavely Please provide the `create table` statement and related indexes, constraints, triggers, keys, etc that relate to the table. – SqlZim May 17 '17 at 14:38
  • CREATE TABLE FolderPaths ( DirId BIGINT, BaseDirId BIGINT, DirLevel BIGINT, DisplayPath NVARCHAR(max), ) CREATE NONCLUSTERED INDEX IX_FolderPaths ON FolderPaths (BaseDirId); – Nuriel Zrubavely May 17 '17 at 20:45
  • @NurielZrubavely Does `AccessPath` need to be `nvarchar(max)`, do the paths exceed 4000 characters in length? Do they need to be unicode or does `varchar()` work instead? – SqlZim May 17 '17 at 20:48
  • There is no limit for the length of the folders paths. It need to support all America and Europe languages characters. – Nuriel Zrubavely May 17 '17 at 20:52