My task: given a set of data in CSV format, display a sankey chart using D3.
Given data format: (I'm unable to change this)
Uses,Types,Feedback
Use1,Type1,Feedback1
Use2,Type1,Feedback1
Use2,Type2,Feedback1
...
Required format for D3 Sankey plugin:
{ "nodes": [
{"name": "Use1"},
{"name": "Use2"},
{"name": "Type1"},
...
], "links": [
{"source":"Use1", "target":"Type1", "value":1},
...
}
My problem: Convert CSV data into JSON needed for the Sankey chart. I can not change the initial data given to me, so I must build the JSON dynamically.
My research led me here, but the only example of massaging the CSV data (that does not already included values, only sources and targets) was through MySQL. As I do not have access to a database on my project, I have resorted to using Underscore.js to help me in the conversion (within a Backbone.js application)
Here is what I have so far, which works as intended.
// buildJSON is a method of a Backbone View that oversees the creation of the diagram
buildJSON: function( csv ) {
var json = {
nodes: [], // unique nodes found in data
links: [] // links between nodes
};
// get unique nodes!
var uniqueNodes = _.chain(csv).map(_.values).flatten().unique().value().sort();
uniqueNodes.forEach(function( node ) {
json.nodes.push({ name: node });
});
// map colors to nodes
this.color.domain(uniqueNodes);
// map links
var links = [];
var rMap = {};
var keys = _.keys(csv[0]);
for ( var i = 0; i < csv.length; i++ ) {
for ( var j = 0; j < keys.length - 1; j++ ) {
var relationship = csv[i][keys[j]] + '-' + csv[i][keys[j + 1]];
rMap[relationship] = ++rMap[relationship] || 1;
}
}
// create links from the linkmap
for ( var r in rMap ) {
if ( rMap.hasOwnProperty(r) ) {
var rel = r.split('-');
links.push({
source: rel[0],
target: rel[1],
value: rMap[r]
});
}
}
var nodeMap = {};
json.nodes.forEach(function( node ) { nodeMap[node.name] = node; });
json.links = links.map(function( link ) {
return {
source: nodeMap[link.source],
target: nodeMap[link.target],
value: link.value
};
});
return json;
}
This wouldn't be such an issue with a small data set, but the data may contain thousands of rows and possibly up to ~10 columns.
So, long story short, my question comes in two parts:
Are there any obvious performance gains I can achieve, and
Is there a better (more efficient) way of massaging data for a Sankey Chart in D3?
I realize this is a particularly narrow issue, so I appreciate any and all help with this!