How to convert table data to nodes and links for d3js Sankey

Question

We have a project where we are working several thousand items through a multi-step process. I want to create a multi-group Sankey diagram to help visualize where we are in the process. This is to run in a browser and is client-side javascript. I followed this demo and have it up and running.

The challenge I face is how to turn my table of data into the links and nodes needed for a Sankey diagram. In that example the data passed to the chart is organized like this:

{ "nodes":[
    {"node":0,"name":"node0"},
    {"node":1,"name":"node1"},
    {"node":2,"name":"node2"},
    {"node":3,"name":"node3"},
    {"node":4,"name":"node4"}
],
"links":[
    {"source":0,"target":2,"value":2},
    {"source":1,"target":2,"value":2},
    {"source":1,"target":3,"value":2},
    {"source":0,"target":4,"value":2},
    {"source":2,"target":3,"value":2},
    {"source":2,"target":4,"value":2},
    {"source":3,"target":4,"value":4}
]}

I'm starting with raw data of an element for each item in our project and each item will have the following (truncated for clarity) list of columns and sample values:

ID  Process  Owner   Decision  Status
01  quick    group1  retire    done
02  standard group2  replace   working
03  quick    none    none      hold
04  quick    group2  retire    working

There are several other columns and values for each one, but I think that gives an idea. It is coming from ajax and in json format (not fixed width columns), but that is the general structure.

How do I convert that to links and nodes?

I found this thread which is asking the same thing (I think) but is for R and I don't have enough experience with that to follow the answer.

I've googled for answers. Most tutorials I found assume you've got the data in the node/links format. And, like the thread above, I found a couple talking about R, php, or SQL for turning the data into nodes and links -- neither of which do I understand.

The data changes hourly-daily and I want to be able to have this just load up for anyone who wants to check at anytime. So I need an automated solution.

I've got the raw data and a model for creating the diagram. I'm just missing how to convert the data programmatically.

EDIT

I already have my raw data into my code. That isn't the problem. The issue is that all of the tutorials assume you've already got nodes and links. I have one row per item. These are not the same thing. I don't need help with json, ajax, or loading files. I appreciate the suggestions, but they are completely in the wrong direction.

I have added an answer below to help you parse your tsv data. Assuming your data is tab-delimited as you have shown above. If you need more help on how to split the data into nodes and links however you would need to provide more information as to what you see as nodes and what links you would like to see. — Coola, Jan 31 '19 at 15:16
Upon further reading maybe what you want is for each ID to be a path and each process, owner, decision and status to be nodes then link between each of them with a value 1? Am I understanding correct? — Coola, Jan 31 '19 at 15:59

Coola · Answer 1 · 2019-01-31T20:37:19.760

Update

Here is the full working Sankey diagram with your data: Sankey diagram from multiple column csv

Based on further information of requirements and improving the codes I came up with this:

  var keys = Object.keys(data[0]); //get the headers for the data
  keys.splice(keys.indexOf('ID'), 1)//remove the IDs key

  data.forEach(function(d){
    keys.forEach(function (key, i){
      if (d[key]!="none") { adduniquenodes(d[key]); } //add node if not "none"
        var c = 1; //checks next column to the i
        if (d[keys[i+c]]!= undefined && d[key] !== "none"){
          while (d[keys[i+c]] === "none"){
            c = c+1;     //jump to next column if "none" found
           }
          graph.links.push ({
            "source" : d[key],
            "target" : d[keys[i+c]],
            "value" : countvalues(key,d[key],keys[i+c],d[keys[i+c]]) 
          });
        }
      })
  });

function adduniquenodes(value) {
      if (graph.nodes.indexOf(value) === -1){
        graph.nodes.push(value);
      }
}

function countvalues (sourcekey, source, targetkey, target) {
      var c = 0;
      data.forEach (function (d){
        if (d[sourcekey] === source && d[targetkey]===target){
          c++;
        }

      });
      return c;
 }
 console.log(graph);

First Answer:

May be this will help: The following code puts each Process, Owner, Decision and Status as unique nodes and the links as well.

Note still not sure what value you would like so I have put 1 as the value.

  graph = {"nodes" : [], "links" : []};
    data.forEach(function(d){
      adduniquenodes(d.Process);
      adduniquenodes(d.Owner);
      adduniquenodes(d.Decision);
      adduniquenodes(d.Status);
      graph.links.push ( {"source" : d.Process,
                            "target" :d.Owner,
                          "value" : countvalues('Process',d.Process,'Owner',d.Owner)                         
                         });
      graph.links.push ( {"source" : d.Owner,
                          "target" :d.Decision,
                          "value" : countvalues('Owner',d.Owner,'Decision',d.Decision)                       
                         });
      graph.links.push ( {"source" : d.Decision,
                          "target" :d.Status,
                          "value" : countvalues('Decision',d.Decision,'Status',d.Status)                        
                         });
    });

    function adduniquenodes(value) {
      if (graph.nodes.indexOf(value) === -1){
        graph.nodes.push(value);
      }
    }

    function countvalues (sourcekey, source, targetkey, target) {
      var c = 0;
      data.forEach (function (d){
        if (d[sourcekey] === source && d[targetkey]===target){
          c++;
        }
      });
      return c;
      }
console.log(graph);

This is a start in the right direction. The value is the big challenge, for example in your links[0] you have quick -> group1. The value is how many "quick processes have group1" But notice in my example that I have a "quick process" that goes directly to hold status. I would also need a quick -> hold. — Rothrock, Jan 31 '19 at 18:28
Just trying to further understand: would you also have none in the Process column or do all your rows have some process value? Also would your rows have Status column values as well? — Coola, Jan 31 '19 at 19:02
You can add a countvalues function to count the values. I have updated the answer to count the values for each. — Coola, Jan 31 '19 at 19:17
This is probably not the best code though and can be optimized for sure. — Coola, Jan 31 '19 at 19:17
I have added updated code which is shorter and easier to read and also should solve your problems of quick --> hold. Please check. — Coola, Jan 31 '19 at 19:44
Ok its working now. I built a block with the code and it looks like a Sankey diagram. Please check and let me know if this is good enough. Hope this helps. — Coola, Jan 31 '19 at 20:03
Please accept as the answer unless you have any other concerns. — Coola, Feb 02 '19 at 12:01
@Rothrock you have still not accepted the answer. I have solved your Sankey problem and produced a Sankey diagram from your flat data in the above updated answer. Please check and let me know if you still need help with this. — Coola, Mar 01 '19 at 19:03

How to convert table data to nodes and links for d3js Sankey

1 Answers1