1

I've got some crossfilter data with dates (d) and values (v):

[
 {d: "2013-07-26T00:00:00.000Z", v: 2.5}
 {d: "2013-07-25T00:00:00.000Z", v: 2.64}
 // ...and many more
[

I've created a group for the months in Crossfilter (crossfilter2@1.4.5):

months = cf.dimension((d) => {
    const dateObj = new Date(d.d);
    // use 1-12 instead of 0-11
    return dateObj.getMonth() + 1;
});

monthsGroup = months.group();

So monthsGroup.all() returns an array of 12 objects, aggregated by month. I want those objects to include the min, max, and median, as well as the 25th and 75th percentile. Reductio (reductio@0.6.3) helps with the min, max, and median out of the box, so I've added a custom aggregator to add the 75th and 25th percentiles.

The following code works, but it's very slow:

const monthReducer = reductio()
.valueList(d => d.v)
.min(true)
.max(true)
.median(true)
.count(true)
.custom({
    add(p) {
        const valueList = p.valueList;
        p.p75 = getPercentile(valueList, 75);
        p.p25 = getPercentile(valueList, 25);
        return p;
    },
    remove(p) {
        const valueList = p.valueList;
        p.p75 = getPercentile(valueList, 75);
        p.p25 = getPercentile(valueList, 25);
        return p;
    },
    initial(p) {
        p.p75 = undefined;
        p.p25 = undefined;
        return p;
    },
});

If I remove the .custom block, it's much faster. This runs the code for each item in the data, which is unnecessary because it only needs to look at the final valueList. Reductio has a barely-documented .post() hook that I think would do the trick here, but I can't get it working.

UPDATE: I got the post-processing hook callback to run, but it doesn't work the way I expected.

I tried registering a new post processor with an undocumented method I saw in the source:

// register post-processing function to add percentiles
reductio.registerPostProcessor('addPercentiles', (prior) => {
    const all = prior();
    return () => {
        const updated = all.map((e) => {
            const valueList = e.value.valueList;
            e.value.p75 = getPercentile(valueList, 75);
            e.value.p25 = getPercentile(valueList, 25);
            return e;
        });
        return updated;
    };
});

and adding it to the post() hook:

// run post-processing to add the 25th & 75th %iles
this.monthsGroup.post().addPercentiles()();

This appears to do what I want, but only once. It doesn't re-run the post hooks when a filter is applied to another dimension.

If median is just the 50th percentile, it should be trivial to also get the 25th and 75th. I feel like I'm close, but I'm obviously doing something wrong. How can I add these aggregations to the reductio reducer?

Gordon
  • 19,811
  • 4
  • 36
  • 74
DMack
  • 871
  • 2
  • 9
  • 21
  • I figured out the post-processor function must return a function. I updated my question, but now I've got a different problem; the hook doesn't happen when I thought it would. – DMack Apr 26 '18 at 17:15
  • I think the hook should run when you call `group.top` or `group.all`. If you are relying on the groups to update automatically with new values, I don't think that will work with the post-processing hooks. Can you confirm? – Ethan Jewett Apr 30 '18 at 20:23
  • I don't think I'm using the hook correctly. If I add a `console.log` for instance in the `getPercentile` function, it only gets printed out when the reducer is initially applied to the group. If I filter other dimensions and do `monthsGroup.all()`, `monthsGroup.top(5)`, etc., the percentiles aren't being recalculated. edit: As long as I have the correct `valueList`, I can manually add the percentiles after the fact, but I'd rather not, especially if the median works fine. – DMack May 01 '18 at 20:34

1 Answers1

0

One solution is to just add the quantiles manually, right before rendering the chart. I have a formatData function does date/time formatting, and restructures the data to be more d3-friendly. Since valueList is still available in every element of the array, I just added a couple of lines to calculate the 25th and 27th percentiles in there.

Not ideal, but very easy!

DMack
  • 871
  • 2
  • 9
  • 21