2

My task: given a set of data in CSV format, display a sankey chart using D3.

Given data format: (I'm unable to change this)

Uses,Types,Feedback
Use1,Type1,Feedback1
Use2,Type1,Feedback1
Use2,Type2,Feedback1
...

Required format for D3 Sankey plugin:

{ "nodes": [
  {"name": "Use1"},
  {"name": "Use2"},
  {"name": "Type1"},
  ...
], "links": [
  {"source":"Use1", "target":"Type1", "value":1},
  ...
}

My problem: Convert CSV data into JSON needed for the Sankey chart. I can not change the initial data given to me, so I must build the JSON dynamically.

My research led me here, but the only example of massaging the CSV data (that does not already included values, only sources and targets) was through MySQL. As I do not have access to a database on my project, I have resorted to using Underscore.js to help me in the conversion (within a Backbone.js application)

Here is what I have so far, which works as intended.

// buildJSON is a method of a Backbone View that oversees the creation of the diagram
buildJSON: function( csv ) {
    var json = {
        nodes: [], // unique nodes found in data
        links: []  // links between nodes
    };

    // get unique nodes!
    var uniqueNodes = _.chain(csv).map(_.values).flatten().unique().value().sort();
    uniqueNodes.forEach(function( node ) {
        json.nodes.push({ name: node });
    });

    // map colors to nodes
    this.color.domain(uniqueNodes);

    // map links
    var links = [];
    var rMap = {};
    var keys = _.keys(csv[0]);
    for ( var i = 0; i < csv.length; i++ ) {
        for ( var j = 0; j < keys.length - 1; j++ ) {
            var relationship = csv[i][keys[j]] + '-' + csv[i][keys[j + 1]];
            rMap[relationship] = ++rMap[relationship] || 1;
        }
    }

    // create links from the linkmap
    for ( var r in rMap ) {
        if ( rMap.hasOwnProperty(r) ) {
            var rel = r.split('-');
            links.push({
                source: rel[0],
                target: rel[1],
                value: rMap[r]
            });
        }
    }

    var nodeMap = {};
    json.nodes.forEach(function( node ) { nodeMap[node.name] = node; });
    json.links = links.map(function( link ) {
        return {
            source: nodeMap[link.source],
            target: nodeMap[link.target],
            value: link.value
        };
    });

    return json;
}

This wouldn't be such an issue with a small data set, but the data may contain thousands of rows and possibly up to ~10 columns.

So, long story short, my question comes in two parts:

  1. Are there any obvious performance gains I can achieve, and

  2. Is there a better (more efficient) way of massaging data for a Sankey Chart in D3?

I realize this is a particularly narrow issue, so I appreciate any and all help with this!

chazsolo
  • 7,873
  • 1
  • 20
  • 44
  • BTW here is the good JS csv parser http://papaparse.com – Konstantin V. Salikhov Oct 16 '14 at 17:53
  • @KonstantinV.Salikhov just tried out their demo - awesome! I'll take a deeper look into this one. – chazsolo Oct 16 '14 at 17:59
  • Do your restrictions mean your webapp is being _served_ the CSV data, or just that you have to process it? You're fairly clear you can't possibly do it serverside ahead of time (assuming the data isn't dynamic), but given the slightest chance, that's the way to go. – mgold Oct 17 '14 at 02:00
  • @mgold at this point it's not a true webapp, just a locally run page that will process a file on the file system. The data isn't dynamic, but is an aggregate of a bunch of excel spreadsheets. It's looking more like processing the file into JSON before running it through the application may be my only choice – chazsolo Oct 17 '14 at 12:18
  • @chazsolo In that case, you can even run your existing code in node.js _once_, save the json output to a file, and serve that. Or rewrite the conversion in a language of your choice. – mgold Oct 17 '14 at 12:31
  • @mgold I'll do some research and see if that's possible for this project - thanks for the advice! – chazsolo Oct 17 '14 at 12:41

0 Answers0