2

With the following data:

const now = new Date
const data = [
  { player: 'bob', color: 'blue', date: new Date(+now + 1000) },
  { player: 'bill', color: 'green', date: new Date(+now + 2000) },
  { player: 'bob', color: 'red', date: new Date(+now + 3000) },
  { player: 'barbara', color: 'blue', date: new Date(+now + 4000) },
  { player: 'barbara', color: 'cyan', date: new Date(+now + 8000) },
  { player: 'barbara', color: 'magenta', date: new Date(+now + 10000) },
  { player: 'barbara', color: 'yellow', date: new Date(+now + 20000) },
]

I want to reduceCount on the color dimension, but only count the first color per-player. (note: first is w.r.t. the date dimension which may be filtered). I have been trying to get this to work with reductio using the exception feature but it did not give the expected results:

reducer = reductio()
reducer.exception('player').exceptionCount(true)
reducer(colorGroup)

The results should look like this:

blue,2     # bob and barbara
cyan,0
green,1    # bill
magenta,0
red,0
yellow,0

Another example, with the date dimension filtered to now+2100..now+20000 (i.e. the first row is filtered out):

blue,1     # barbara
cyan,0
green,0    # (bill's first color is not counted because it is outside the date range)
magenta,0
red,1      # bob (blue is his first color overall, red is his first in this date range)
yellow,0

Note: I have other groupings which use all of the rows so I can't just pre-filter the list before loading it into crossfilter.

Is there some way to use reductio() for this? Or some example of how to do this "grouping upon grouping" with crossfilter directly?

EDIT

Link to jsfiddle showing unexpected result: https://jsfiddle.net/qt5jxjm1/

oof
  • 21
  • 2
  • "Double reductions" are always tricky. This is an interesting problem. Any chance you could describe your actual use for this? (Just out of curiosity.) – Gordon Sep 26 '16 at 17:03
  • I actually was able to get something kind-of working here: https://jsfiddle.net/4j3gs57y/ using a fake group. It displays the correct data when filtering on date, but it falls over when filtering by other dimensions, e.g. its own chart changes when brushing over it because the data is actually coming from another group which is being filtered by the brushing). It is like I want the date filtering to occur at an early stage than the rest. – oof Sep 26 '16 at 18:04
  • That's a good point. You'll be filtering by color in this example, but your source data comes from the date dimension, so that's another hurdle. I guess you could pull data from a color dimension and then sort internally by date... – Gordon Sep 26 '16 at 18:36
  • Yeah, this is not what reductio exception aggregation does. Your example will only count the first record per player *in a group*. Since you group by color and each player only has one record per color, you are still counting all records. Like Gordon, I'm curious about the use-case here. Why do you only want to show one record per player? How do you determine what records is the first one? – Ethan Jewett Sep 26 '16 at 19:10
  • Would it make any sense to have two crossfilter instances? One for exploring the "all time" data and one for exploring the "latest per-player" data? As the date brush changes on the "latest per-player" one, manually add/remove records from that crossfilter. I think logically this is what I am trying to do at least, i.e. be able to slice and dice the same dimensions on both: an entire history of events, and also just the latest event per person. – oof Sep 26 '16 at 19:11
  • @EthanJewett the single record per player is the one with the highest timestamp (dupe timestamp won't happen but highest id could also be used if that was an issue). Edit: I should clarify "highest timestamp *within the filtered date range*" (otherwise I would just pre-process the data and add an isLatest boolean.) – oof Sep 26 '16 at 19:19
  • I went ahead with multiple crossfilter instances and ended up with something a little rough around the edges, and kind of wild in implementation... but it works and it'll do for my use case: https://jsfiddle.net/kv22vfkk/3/ – oof Sep 26 '16 at 23:14
  • There was a bug (didn't re-apply filters after changing the crossfilter data): Updated fiddle: https://jsfiddle.net/kv22vfkk/4/ – oof Sep 26 '16 at 23:36

1 Answers1

3

I'm not sure if crossfilter is going to help you very much here - it doesn't really consider the order of values, and it certainly doesn't have a way to sort by one key and then bin by another.

Here is a fake group that will do close to what you want, by using another dimension for ordering, and a couple of accessors for the group key and the "first field", i.e. the field you want to look for the first of:

function double_reduce(dim, groupf, firstf) {
    return {
      all: function() {
      var recs = dim.bottom(Infinity);
      var hit = {}, bins = {};
      recs.forEach(function(r) {
        var fkey = firstf(r), gkey = groupf(r);
        var count = hit[fkey] ? 0 : 1;
        hit[fkey] = true;
        bins[gkey] = (bins[gkey] || 0) + count;
      });
      return Object.keys(bins).map(function(k) {
        return {key: k, value: bins[k]};
      });
    }
  }
}

Use it like this:

var dubred_group = double_reduce(dateDim,
    function(r) { return r.color;}, function(r) { return r.player; });

One thing this can't do is deliver zeros for any values that are filtered out. Ordinarily crossfilter does incremental adds and removes and I don't see how that would be possible here.

So the results without any dates filtered look good:

[
  {
    "key": "blue",
    "value": 2
  },
  {
    "key": "green",
    "value": 1
  },
  {
    "key": "red",
    "value": 0
  },
  {
    "key": "cyan",
    "value": 0
  },
  {
    "key": "magenta",
    "value": 0
  },
  {
    "key": "yellow",
    "value": 0
  }
]

But there is a missing bin for the filtered case, because green does not appear in the filtered data:

[
  {
    "key": "red",
    "value": 1
  },
  {
    "key": "blue",
    "value": 1
  },
  {
    "key": "cyan",
    "value": 0
  },
  {
    "key": "magenta",
    "value": 0
  },
  {
    "key": "yellow",
    "value": 0
  }
]

This could probably be fixed, but I thought I would post this for feedback.

Demo in a fiddle: http://jsfiddle.net/zdq4rj13/7/

Gordon
  • 19,811
  • 4
  • 36
  • 74