This is not a homework task (please see my profile). I do not have a computer science background and this question came up in an applied machine learning problem. I am pretty sure that I am not the first person to have this problem, hence I am looking for an elegant solution. I will preferre a solution using a python library over raw implementations.
Assume we have a dictionary connecting letters and numbers as input
connected = {
'A': [1, 2, 3],
'B': [3, 4],
'C': [5, 6],
}
Each letter can be connected to multiple numbers. And one number can be connected to multiple letters. But each letter can only be connected to a number once.
If we look at the dictionary we realize, that the number 3
is connected with the letter 'A'
and the letter 'B'
hence we can put 'A'
and 'B'
into a cluster. The numbers of the letter 'C'
are not present in the other letters. Hence, we cannot cluster the letter 'C'
any further. And the expected output should be
cluster = {
'1': {
'letters': ['A', 'B'],
'numbers': [1, 2, 3, 4],
},
'2': {
'letters': ['C'],
'numbers': [5, 6],
}
}
I think this should be related to graph algorithms and connected subgraphs but I do not know where to start.