-1

I am trying to count the number of times that name in NAME appear in BOSS_PATH.

The source

NAME    BOSS_PATH
---------------------
WIN     WIN
JOHN    WIN|JOHN
DANG    WIN|JOHN|DANG
JOSH    JOSH

The result I want

NAME    BOSS_PATH        COUNT_UNDER
--------------------------------------
WIN     WIN                   2
JOHN    WIN|JOHN              1
DANG    WIN|JOHN|DANG         0
JOSH    JOSH                  0

My thought is the query would be something like this

(SUM(the frequency of NAME appears in BOSS_PATH) - 1) AS COUNT_UNDER

But I still have problem writing this as an actual query.

  • T-SQL is a SQL dialect that is used by multiple products including Sybase, SQL Server, and Azure Synapse. What is the (R)DBMS product you are using here? [Edit] your question to tag that product, and (if relevant) the version tag for the product as well. – Thom A Feb 14 '23 at 10:02
  • 3
    Why not fix your design and normalise your data? – Thom A Feb 14 '23 at 10:02
  • 1
    Can you explain your results. John exists twice and Win 3 times. Just spotted the "- 1" in your "thoughts" - is that the reason? If so why 0 and not -1? Regardless the solution you need it to split the string and aggregate. – Stu Feb 14 '23 at 10:04
  • `PATH` contains strings, not users. `WIN|JOHN` is a string, not an array containing `WIN` and `JOHN`. If you wanted to store multiple values, the correct, easy and fast solution would be to use a separate table. It would take *less* space and allow easier and faster queries even without indexing. You can't index string values like those stored in `PATH` at all – Panagiotis Kanavos Feb 14 '23 at 10:04
  • In all SQL Server versions in mainstream support you can use `STRING_SPLIT` to split a string into multiple rows. That's more expensive than having a separate table. If you used JSON or XML instead of `|` you could use the JSON or XML query functions too. Neither would be as fast as a proper table design though. If you want hierarchical queries, using a `hierarchyid` column would provide easy queries and indexing. Or you could use an self-referencing design (ParentID->ID) and a recursive CTE – Panagiotis Kanavos Feb 14 '23 at 10:08
  • Sorry for the confusion. I changed column names `USER` to `NAME` and `PATH` to `BOSS_PATH`. – Win Wongsawatdichart Feb 14 '23 at 10:11
  • You've removed the tag [[tag:tsql]] but now we just have [[tag:sql]], so we have no idea what RDBMS you are using, or what dialect you need your solution in. – Thom A Feb 14 '23 at 10:14
  • The edit added no information. It looks like `PATH` is used to represent a hierarchy path, and you're asking how to find the children of each row. There are similar questions [like this one](https://stackoverflow.com/questions/16720236/get-direct-descendants-count-with-hierarchyid) using `hierarchyid`. In that case it's easy to find the parent with `Position.GetAncestor(1)`. By using a string as the hierarchy path, things have become a *lot* harder – Panagiotis Kanavos Feb 14 '23 at 10:14
  • You can find a row's descendants because their `PATH` values start with the parent's path, eg `mytable parent inner join mytable child on parent.PATH = child.PATH+ '%' AND parent.Name!=child.Name`. That can't take advantage of indexing though, so it will be very slow. – Panagiotis Kanavos Feb 14 '23 at 10:18
  • I think what you are asking for (based on the result table you've provided) is how many people report directly to each person. If that is the case then it would be much easier to query the source data that was used to generate the hierarchical path, rather than this interim dataset you seem to be using – NickW Feb 14 '23 at 12:28

1 Answers1

0

This will probably do it (though no idea how it will perform if there is a large volume of data). You can probably combine some of the steps but I've written it out as multiple CTEs to show the logical progression of the approach I used:

-- Get a CTE of all the names
WITH NAMES AS (
SELECT NAME FROM HIER_DATA
),
-- Get a CTE of all the paths
PATHS AS (
SELECT BOSS_PATH FROM HIER_DATA
),
-- Get a CTE of every name/path combination
-- The FULL JOIN syntax may need to be adjusted for your specific DBMS
CROSS_JOIN AS (
SELECT * FROM NAMES
FULL JOIN PATHS
)
-- Someone reports to the NAME if the BOSS_PATH contains the NAME followed by a '|'
-- If this is true give the record a value of 1 and sum by the NAME
SELECT NAME, SUM(CASE WHEN POSITION(NAME||'|',boss_path) > 0 THEN 1 ELSE 0 END) IS_PARENT
FROM CROSS_JOIN
GROUP BY NAME
ORDER BY NAME
;
NickW
  • 8,430
  • 2
  • 6
  • 19