19

I like the user experience of cubism, and would like to use this on top of a backend we have.

I've read the API doc's and some of the code, most of this seems to be extracted away. How could I begin to use other data sources exactly?

I have a data store of about 6k individual machines with 5 minute precision on around 100 or so stats.

I would like to query some web app with a specific identifier for that machine and then render a dashboard similar to cubism via querying a specific mongo data store.

Writing the webapp or the querying to mongo isn't the issue.

The issue is more in line with the fact that cubism seems to require querying whatever data store you use for each individual data point (say you have 100 stats across a window of a week...expensive).

Is there another way I could leverage this tool to look at data that gets loaded using something similar to the code below?

var data = [];
d3.json("/initial", function(json) { data.concat(json); });
d3.json("/update", function(json) { data.push(json); });
VividD
  • 10,456
  • 6
  • 64
  • 111
Miles McCrocklin
  • 920
  • 2
  • 9
  • 12

1 Answers1

19

Cubism takes care of initialization and update for you: the initial request is the full visible window (start to stop, typically 1,440 data points), while subsequent requests are only for a few most recent metrics (7 data points).

Take a look at context.metric for how to implement a new data source. The simplest possible implementation is like this:

var foo = context.metric(function(start, stop, step, callback) {
  d3.json("/data", function(data) {
    if (!data) return callback(new Error("unable to load data"));
    callback(null, data);
  });
});

You would extend this to change the "/data" URL as appropriate, passing in the start, stop and step times, and whatever else you want to use to identify a metric. For example, both Cube and Graphite use a metric expression as an additional query parameter.

mbostock
  • 51,423
  • 13
  • 175
  • 129
  • just making sure, this is intended for each new metric. So each client connected to this will be making x queries to your database for x metrics. There is no easy way, using reduce this using cubism? e.g calling a chart and then having an accessor function? – Miles McCrocklin May 10 '12 at 07:09
  • 1
    Sure, you could write an alternate metric implementation that fetches multiple metrics in a batch, but typically it's not worth it. Our dashboards often make hundreds of concurrent requests to Graphite (which get partially serialized by the browser, since it won't make more than 4 or 8 concurrent requests per host), and there's no performance issue. To merge concurrent requests, you'd put multiple requests into a queue and use a timeout to make a combined request. – mbostock May 10 '12 at 17:05
  • Just to run it by someone else whose got experience in this area, tell me if you think this is worth it: I have around 500GB of data about n machines. The db is indexed by timestamp, a machine id, and the combination of the two. In order to do one query (this is MongoDB) it takes around 12 seconds to get a sorted list of 1440 results for 1 machine. So x * 12 seconds = load time where x is the number of metrics. – Miles McCrocklin May 10 '12 at 18:51
  • The alternatives I've looked into are setting up cube or graphite and loading the data into them for a given machine id. This is a great caching method but these queries don't occur often enough to cache for multiple users and I am pretty sure that the performance of this to happen on page load would be pretty low (if not low than the x * 12 method). – Miles McCrocklin May 10 '12 at 18:53
  • 2
    I'm surprised it's taking 12 seconds to return 1,440 rows if you have the indexes setup correctly. If you're looking for values for a specific machine, make sure the index is ordered correctly: machine id, then timestamp. That way, the DB can use the second part of the index (time) for a range query. Also, if you are storing a lot of data per-row, tell Mongo which fields you want to return, or store smaller objects in separate collections. – mbostock May 10 '12 at 21:10
  • Yea, we've fixed it, both of those were taken care of, however someone else documented the box as being indexed by timestamp, but it was not. He started indexing around 4 hours ago, now it's taking roughly .25 seconds per query, but that still means your load time is directly correlated to the number of metrics you are dealing with, .25 * 100 is still roughly 25 second load time (I'm not sure how much this matters to someone using this system, especially since it's internal). – Miles McCrocklin May 10 '12 at 21:57
  • Also I just wanted to thank you for how helpful you continue to be both here, the d3 google group. – Miles McCrocklin May 10 '12 at 21:58