I didn't know about recursive CTEs until Erwin Brandstetter mentioned them in the comment above. The concept is what it sounds like: a CTE that refers to itself, and has a base case so that recursion terminates. For your problem, a recursive CTE solution might look something like:
WITH accumulate_users AS (
-- Base case: the direct links from a user_id.
SELECT
user_id AS user_id,
ARRAY_AGG(linked_id) AS linked_accounts
FROM your_table
GROUP BY user_id
UNION ALL
-- Recursive case: transitively linked accounts.
SELECT
ARRAY_UNION(
accumulate_users.linked_accounts,
ARRAY_AGG(DISTINCT your_table.linked_id)
) AS linked_accounts
FROM accumulate_users
JOIN your_table ON CONTAINS(accumulate_users.linked_accounts, your_table.user_id)
GROUP BY accumulate_users.user_id
-- But there is no enforced termination condition, hopefully it just
-- ends at some point? This is part of why implementing recursive CTEs
-- is challenging, I think.
)
SELECT
user_id,
CARDINALITY(linked_accounts) AS count_linked_accounts,
linked_accounts
FROM accumulate_users
But, I haven't been able to test this query, because as detailed in another Stack Overflow Q&A Presto does not support recursive CTEs.
It is possible to traverse an arbitrary, but finite, number of links by repeatedly joining back to the table you have. Something like this, and I've included the second_, third_, fourth_degree_links only for clarity:
SELECT
yt1.user_id,
ARRAY_AGG(DISTINCT yt2.user_id) AS first_degree_links,
ARRAY_AGG(DISTINCT yt3.user_id) AS second_degree_links,
ARRAY_AGG(DISTINCT yt3.linked_user) AS fourth_degree_links,
ARRAY_UNION(
ARRAY_AGG(DISTINCT yt2.user_id),
ARRAY_UNION(ARRAY_AGG(DISTINCT yt3.user_id), ARRAY_AGG(DISTINCT yt3.linked_user))
) AS up_to_fourth_degree_links
FROM your_table AS yt1
JOIN your_table AS yt2 ON yt1.linked_user = yt2.user_id
JOIN your_Table AS yt3 ON yt2.linked_user = yt3.user_id
GROUP BY yt1.user_id
I've been working with a similar set of data, although I have the original identifiers as part of the raw data set. In other words the 'email' and 'phone number' in your example. I found it helpful to create a table that groups user ids by these connecting identifiers:
CREATE TABLE email_connections AS
SELECT
email,
ARRAY_AGG(DISTINCT user_id) AS users
FROM source_table
GROUP BY email
The same arbitrary-but-finite-depth set of links can then be computed by looking for intersections between the user arrays:
SELECT
3764350 AS user_id,
FLATTEN(ARRAY_AGG(ARRAY_UNION(emails1.users, ARRAY_UNION(emails2.users, ARRAY_UNION(emails3.users, emails4.users))))) AS all_users,
CARDINALITY(FLATTEN(ARRAY_AGG(ARRAY_UNION(emails1.users, ARRAY_UNION(emails2.users, ARRAY_UNION(emails3.users, emails4.users)))))) AS count_all_users
FROM email_connections AS emails1
JOIN email_connections AS emails2 ON CARDINALITY(ARRAY_INTERSECT(emails1.users, emails2.users)) > 0
JOIN email_connections AS emails3 ON CARDINALITY(ARRAY_INTERSECT(emails2.users, emails3.users)) > 0
JOIN email_connections AS emails4 ON CARDINALITY(ARRAY_INTERSECT(emails3.users, emails4.users)) > 0
WHERE CONTAINS(emails1.users, 3764350)
GROUP BY 1
Calculating links to an arbitrary depth is a good use case for a graph database technology like Neo4j or JanusGraph. That's what I'm now looking at to address this "user linking" problem.