Grouping connections between users in a network based on number of mutuals

Question

I have a list of tuples. Each tuple represents a person in a social network. The first item is their id or "name". The second is a dictionary; each key is another person in the network with whom they have mutual connections, and its value is how many mutuals they have together.

network = [
    (6, {3: 3, 4: 3, 7: 2, 1: 3, 11: 2}),
    (1, {7: 3, 11: 4, 6: 3, 4: 3}),
    (4, {3: 2, 6: 3, 1: 3, 11: 2, 12: 3}),
    (2, {9: 4, 8: 2, 10: 2, 5: 2}),
    (12, {3: 2, 4: 3}),
    (3, {5: 2, 8: 2, 12: 2, 4: 2, 7: 2, 6: 3}),
    (10, {2: 2, 9: 3, 8: 3, 5: 2}),
    (5, {3: 2, 8: 3, 9: 4, 10: 2, 2: 2}),
    (13, {}),
    (8, {2: 2, 9: 3, 10: 3, 3: 2, 5: 3}),
    (7, {3: 2, 6: 2, 1: 3}),
    (11, {1: 4, 6: 2, 4: 2}),
    (9, {2: 4, 8: 3, 10: 3, 5: 4}),
]

If two people have 1, 2, or 3 mutuals, they might know each other. If they have 4 mutuals, they probably know each other. I want to process this list so that I can determine who might/probably knows whom, resulting in output like this:

Name: 1
    Might know: 4, 6, 7
    Probably knows: 11
Name: 2
    Might know: 5, 8, 10
    Probably knows: 9
Name: 3
    Might know: 4, 5, 6, 7, 8, 12
    Probably knows: 
Name: 4
    Might know: 1, 3, 6, 11, 12
    Probably knows: 
Name: 5
    Might know: 2, 3, 8, 10
    Probably knows: 9
Name: 6
    Might know: 1, 3, 4, 7, 11
    Probably knows: 
Name: 7
    Might know: 1, 3, 6
    Probably knows: 
Name: 8
    Might know: 2, 3, 5, 9, 10
    Probably knows: 
Name: 9
    Might know: 8, 10
    Probably knows: 2, 5
Name: 10
    Might know: 2, 5, 8, 9
    Probably knows: 
Name: 11
    Might know: 4, 6
    Probably knows: 1
Name: 12
    Might know: 3, 4
    Probably knows:

Here is my code I'm currently using to process it:

might = []
probably = []
for person in network:
    name = person[0]
    connections = person[1]
    for other_name, mutuals in connections.items():
        if mutuals > 3:
            probably.append(str(other_name))
        else:         
            might.append(str(other_name))

But I only end up with my two lists:

['3', '4', '7', '1', '11', '7', '6', '4', '3', '6', '1', '11', '12', '8', '10',
 '5', '3', '4', '5', '8', '12', '4', '7', '6', '2', '9', '8', '5', '3', '8',
 '10', '2', '2', '9', '10', '3', '5', '3', '6', '1', '6', '4', '8', '10']

['11', '9', '9', '1', '2', '5']

How can I associate these with the proper names?

Forgot to mention that at the end paired:dict looks like this: pair_dict = {2: 4, 8: 3, 10: 3, 5: 4} — Thesqlkid, Feb 17 '21 at 02:34
Welcome to Stack Overflow! Reading this... makes my head spin. I think I finally understand what you're trying to do, but when it's that hard to see, you might have an [XY problem](https://xyproblem.info/). It's hard to say without knowing what all this represents, but it's possible you could make your data itself clearer (to yourself and others) using either [namedtuples](https://www.geeksforgeeks.org/namedtuple-in-python/) or dictionaries with helpfully named keys. — CrazyChucky, Feb 17 '21 at 02:47
Thanks. What the problem I am trying to solve is I started with a list of numbers that are supposed to represent people. Each person has its connections after it. So what I am trying to do is based on the person each person is connected to, I need to find people they are not currently connected to but 'might' know. So if a 'person' is connected to 2 or 3 of their connections they go in the might category. 4 or more goes in the probably category. So for this specific issue, I have their possible connections and the number of times that 'person' is found it their connections. — Thesqlkid, Feb 17 '21 at 02:59
Should I repost with more meaningful names? the probably and might are descriptive but I see what you are saying on the others. — Thesqlkid, Feb 17 '21 at 03:00
I submitted what I hope is a helpful edit. Please look it over, and accept it if you're happy with it. — CrazyChucky, Feb 17 '21 at 03:22
So the number outside the brackets is supposed to represent a person and the number before the colon is the actual number found in their connections connections and the number after the colon is the number of times that appears in their connections connections. So I am trying to split it into a list where numbers with a 4 or greater after the colon go in the probably list, those with a 2 or 3 after the colon go in the might list and less than 2 do not go in either list. So for the number 6 the number 3 appears 3 times (3: 3) 4 appears 3 times,(4:3) and 7 appears 2 times (7:2) — Thesqlkid, Feb 17 '21 at 03:25

score 0 · Accepted Answer · answered Feb 17 '21 at 03:46

Your desired output is essentially a dictionary, so it makes sense to build it as such. Each key will be a name; each value will be another dictionary, with keys 'might' and 'probably'. (Its values are both lists.)

output = {}
for name, connections in network:
    # If we've not added this name yet, create a blank entry:
    if name not in output:
        output[name] = {'probably': [], 'might': []}
    
    # Now loop through the connected people and add to the correct list:
    for other_name, mutuals in connections.items():
        if mutuals > 3:
            output[name]['probably'].append(other_name)
        else:
            output[name]['might'].append(other_name)

At this point, we can use Python's pprint function to check we're on the right track. (It's much more readable than print for nested structures like this.)

from pprint import pprint
pprint(output)

Output

{1: {'might': [7, 6, 4], 'probably': [11]},
 2: {'might': [8, 10, 5], 'probably': [9]},
 3: {'might': [5, 8, 12, 4, 7, 6], 'probably': []},
 4: {'might': [3, 6, 1, 11, 12], 'probably': []},
 5: {'might': [3, 8, 10, 2], 'probably': [9]},
 6: {'might': [3, 4, 7, 1, 11], 'probably': []},
 7: {'might': [3, 6, 1], 'probably': []},
 8: {'might': [2, 9, 10, 3, 5], 'probably': []},
 9: {'might': [8, 10], 'probably': [2, 5]},
 10: {'might': [2, 9, 8, 5], 'probably': []},
 11: {'might': [6, 4], 'probably': [1]},
 12: {'might': [3, 4], 'probably': []},
 13: {'might': [], 'probably': []}}

(Note that pprint automatically sorts the keys for display: they're not actually in that order.)

Now all we need to do is format it for display, however we please. I've left the names as integers up until this point, so that we can sort them correctly (and not have 11 end up before 2, as would happen when sorting strings). If those assignments look complicated, have a look at str.join and list comprehensions. And you may or may not know about f-strings, which are also quite handy (and don't require variables to even be strings!)

for name, contents in sorted(output.items()):
    print(f'Name: {name}')

    might = ', '.join([str(i) for i in sorted(contents['might'])])
    print(f'\tMight know: {might}')

    probably = ', '.join([str(i) for i in sorted(contents['probably'])])
    print(f'\tProbably knows: {probably}')

Output:

Name: 1
    Might know: 4, 6, 7
    Probably knows: 11
Name: 2
    Might know: 5, 8, 10
    Probably knows: 9
Name: 3
    Might know: 4, 5, 6, 7, 8, 12
    Probably knows: 
Name: 4
    Might know: 1, 3, 6, 11, 12
    Probably knows: 
Name: 5
    Might know: 2, 3, 8, 10
    Probably knows: 9
Name: 6
    Might know: 1, 3, 4, 7, 11
    Probably knows: 
Name: 7
    Might know: 1, 3, 6
    Probably knows: 
Name: 8
    Might know: 2, 3, 5, 9, 10
    Probably knows: 
Name: 9
    Might know: 8, 10
    Probably knows: 2, 5
Name: 10
    Might know: 2, 5, 8, 9
    Probably knows: 
Name: 11
    Might know: 4, 6
    Probably knows: 1
Name: 12
    Might know: 3, 4
    Probably knows: 
Name: 13
    Might know: 
    Probably knows:

That worked. Thank you so much. I have to play around with the printing a little bit as they want it formatted a little differently than here but the hard part is done. Thanks for all your help. Now I know to explain the problem I am trying to solve instead of the problem I am having. Thanks. — Thesqlkid, Feb 17 '21 at 04:06
@Thesqlkid If you [don't want the spaces](https://stackoverflow.com/questions/66251925/how-to-trim-spaces-in-an-f-string) in the comma-separated list of numbers, just take the space out of `', '`. — CrazyChucky, Feb 18 '21 at 21:53

Grouping connections between users in a network based on number of mutuals

1 Answers1