Removing circular data in a dictionary of dictionaries for use in Sankey diagram

Question

I've hit a challenge with Sankey diagrams for my personal accounting app. The issue is that the Sankey, generated using Google Charts, won't render if the cash moves in a circle, eg from Ledger A to Ledger B to C and back to A.

The script needs needs an extra step that evaluates the data and if there's a circular movement then it should break that circle by removing the link with the lowest value.

The code starts with a dictionary of dictionaries which contains the amount of cash that's moving between each possible pair of ledgers. Zeros mean there's no cash moving between the two ledgers. This dictionary of dictionaries should be evaluated and circles broken.

Below is some code based on what @JacobDavis response. It does succeed in finding circles but not always. In the example below you can see that Ledger A leads to B. But B leads to both C and D. The code only checks C and thus misses the cycle caused by D.

The code doesn't yet try to break the cycle by removing a link. Trying to get it to identify cycles first.

all_ledgers = {
  "a": {"a": 0, "b": 1, "c": 0, "d": 0, "e": 0},
  "b": {"a": 0, "b": 0, "c": 1, "d": 1, "e": 0},
  "c": {"a": 0, "b": 0, "c": 0, "d": 0, "e": 1},
  "d": {"a": 1, "b": 0, "c": 0, "d": 0, "e": 0},
  "e": {"a": 0, "b": 0, "c": 0, "d": 0, "e": 0}
}

def evaluate_for_circular(dictionary_of_dictionaries):
  
  output = [] # Will put circular ledgers into here.
  
  def evaluate(start,target=None,cache=None):
  
      if not target:
          target=start
      if not cache:
        cache=[start]
      # Here we are looking at a  new row and will iterate through rows until we find our target (which is same as the row ledger)
      print('Evaluatating. Start: '+str(start)+'. Target: '+str(target))
  
      for ledger in dictionary_of_dictionaries[start].keys():
          # We now iterate through the items in the row. We use the keys rather than values as we're looking
          # for the target.
          print('Dealing with ledger '+str(ledger))
          print('Cache: '+str(cache))
          if dictionary_of_dictionaries[start][ledger]>0 and ledger==target:
              return ledger
          elif dictionary_of_dictionaries[start][ledger]>0 and ledger not in cache:
              cache.append(ledger)
              return evaluate(ledger,target,cache)
              #return evaluate(ledger,target)
      return False

  for dict in dictionary_of_dictionaries.keys():
    print('--')
    print('Starting evaluation of row '+str(dict))
    if evaluate(dict):
      output.append(dict)
      
  if output:
    return output
  else:
    return False
 
q = evaluate_for_circular(all_ledgers)

if q:
    print("Circle found in data: "+str(q))
else:
    print("No circle found in data.")

Could you edit your question to include the relevant code for a reproducible minimal example? — sytech, Jan 22 '22 at 15:35

SiP · Accepted Answer · 2022-02-03T07:21:28.587

Working from @JacobDavis answer, you need to let it finish the loop. As it is now it can exit too early after finding the first transfer.

ledgers = {
  "a": {"a": 0, "b": 2, "c": 3, "d": 0, "e": 0},
  "b": {"a": 0, "b": 0, "c": 1, "d": 1, "e": 0},
  "c": {"a": 1, "b": 0, "c": 0, "d": 0, "e": 1},
  "d": {"a": 1, "b": 0, "c": 0, "d": 0, "e": 0},
  "e": {"a": 0, "b": 0, "c": 0, "d": 0, "e": 0}
}


def find_cycle(all_ledgers: dict, start: str, target="", cycle=None, v=None):
if cycle is None:
    cycle = [start]
else:
    cycle = cycle + [start]

if v is None:
    v = []

if target == "":
    target = start

for ledger in all_ledgers[start].keys():
    if all_ledgers[start][ledger] > 0 and ledger == target:
        v = v + [all_ledgers[start][ledger]]
        return True, cycle, v
    elif all_ledgers[start][ledger] > 0 and ledger not in cycle:
        flag, cycle2, v = find_cycle(all_ledgers, ledger, target, cycle, v)
        if flag:  # this check will make the loop continue over all sub-ledgers
            v = v + [all_ledgers[start][ledger]]
            return flag, cycle2, v

return False, [], []

This should find the first cycle and return the ledgers involved and the transfer values in the cycle (in reversed order).

A way to use this information to break the cycles:

def break_cycle(all_ledgers: dict):
for sub_ledger in all_ledgers.keys():
    chk, cyc, u = find_cycle(ledgers, sub_ledger)
    u = list(reversed(u))
    if chk:
        print(f"There is a cycle starting in ledger {sub_ledger}: {cyc}")
        print(f'cycle values found: {u}')
        min_transfer_index = u.index(min(u))
        ledger1 = cyc[min_transfer_index]
        if min_transfer_index == len(cyc)-1:
            ledger2 = cyc[0]
        else:
            ledger2 = cyc[min_transfer_index + 1]
        print(f'setting the transfer value from {ledger1} to {ledger2} to 0')
        ledgers[ledger1][ledger2] = 0
        return True
    else:
        print(f"There is no cycle starting in ledger {sub_ledger}")
return False

And finally to break all the cycles:

while break_cycle(ledgers):
    continue

Note that this is not very efficient as it scans the whole data set from the beginning every time a cycle is broken but it should get the job done.

Hi @SiP just wanted to let you know the code works and has successfully broken the cycles in my data. My app is now able to generate beautiful Sankey diagrams. — rcx935, Feb 06 '22 at 08:05
@rcx935 Happy to hear, please consider accepting the answer for posterity. — SiP, Feb 07 '22 at 09:42

score 1 · Answer 2 · answered Jan 22 '22 at 15:35

1

Check out the example below. "all_ledgers" is supposed to represent your top-level dictionary. It loops through the keys with positive values recursively, returning True if it ever encounters the initial target (the start value on the first iteration). If it makes it through the whole search without finding a cycle, it returns False. The cache is there to prevent an infinite loop.

all_ledgers={'a':{'b':0,'c':1},
      'b':{'a':1},
      'c':{'b':1}}

def find_cycle(start,target=""):
    if target=="":
        target=start
    try:
        cache
    except NameError:
        cache = [start]
    for ledger in all_ledgers[start].keys():
        if all_ledgers[start][ledger]>0 and ledger==target:
            return True
        elif all_ledgers[start][ledger]>0 and ledger not in cache:
            cache.append(ledger)
            return find_cycle(ledger,target)
    return False
        
if find_cycle('a'):
    print("There is a cycle.")
else:
    print("There is no cycle.")

answered Jan 22 '22 at 15:35

Jacob Davis

29
4

Hi, thank you. I've managed to follow what the code does. I would never have been able to visualise that. I'm going to try to tweak the code to better fit my usecase. If I get something suitable working then I'll post back here as I think a function like this would be useful for anyone who's using Sankey diagrams. Thank you again @jacobDavis – rcx935 Jan 23 '22 at 08:26
Hi Jacob. I've found a scenario where the code doesn't find a circle. I've edited the question to reflect this. I may have to put this question up for bounty. – rcx935 Feb 01 '22 at 08:07
Hi Jacob, the code worked great in initial tests but not in all scenarios. I've edited the question with an example. – rcx935 Feb 01 '22 at 21:38
Sorry rcx935. I haven't logged into stackoverflow for a while, and I just saw this. And thanks @SiP for fixing the code. I tried to upvote, but I don't have enough reputation points yet. – Jacob Davis Jul 10 '22 at 14:20

Removing circular data in a dictionary of dictionaries for use in Sankey diagram

2 Answers2