1

My data looks like this

field1,field2,value1,value2
a,b,1,1
b,a,2,2
c,a,3,5
b,c,6,7
d,a,6,7

The ultimate goal is to get value1+value2 for each distinct value of field1 and field2 : {a:15(=1+2+5+7),b:9(=1+2+6),c:10(=3+7),d:6(=6)}

I don't have a good way of rearranging that data so let's assume the data has to stay like this.

Based on this previous question (Thanks @Gordon), I mapped using :

cf.dimension(function(d) { return [d.field1,d.field2]; }, true);

But I am left a bit puzzled as to how to write the custom reduce functions for my use case. Main question is : from within the reduceAdd and reduceRemove functions, how do I know which key is currently "worked on" ? i.e in my case, how do I know whether I'm supposed to take value1 or value2 into account in my sum ?

(have tagged dc.js and reductio because it could be useful for users of those libraries)

Chapo
  • 2,563
  • 3
  • 30
  • 60
  • Is it possible that `field1` and `field2` are the same, and what should happen in this case? Disregarding any other fields, I think the easiest solution is probably to transform the data in the client, flattening to just one field and one value per row. However, this will cause the other fields to get counted twice. – Gordon Jul 01 '19 at 10:36
  • Field1 and field2 are always different. Flattening would be an option but I would have to have 2 crossfilters hence 2 times the memory, the post processing etc. – Chapo Jul 01 '19 at 11:47
  • I meant, flatten the data into a larger array and then put it into one crossfilter. Memory shouldn't be an issue unless you have a million rows or something. Not sure what post processing you're referring to. – Gordon Jul 01 '19 at 12:22
  • I cannot change the structure of my current crossfilter (based on the data in the question). So I would have to create a new crossfilter with only the flattened data. I guess it could work but I would lose the links when filtering on the first crossfilter. – Chapo Jul 01 '19 at 12:29
  • I see. I can't think of any other way to do it at the moment, but I'll keep thinking about it. It would be helpful [if the group key were available to the reduce functions](https://github.com/crossfilter/crossfilter/issues/103) - imperfect, but at least you could figure out why the row was included and use the corresponding value. – Gordon Jul 01 '19 at 13:35
  • This is starting to look sorta like [the reduce columns problem](https://stackoverflow.com/questions/24737277/dc-js-how-to-create-a-row-chart-from-multiple-columns) - if so, it's definitely possible to make it work using a groupAll, but it's tricky and might cause filtering trouble. – Gordon Jul 01 '19 at 13:36
  • Yes the other way I could think of is build myself the resulting object in custom reduce functions and use the value accessor function in the chart parameters to get to what I need there. Will give it a try tomorrow and post here if successful. – Chapo Jul 01 '19 at 13:46

2 Answers2

1

OK so I ended up doing the following for defining the group :

reduceAdd: (p, v) => {
    if (!p.hasOwnProperty(v.field1)) {
        p[v.field1] = 0;
    }
    if (!p.hasOwnProperty(v.field2)) {
        p[v.field2] = 0;
    }
    p[v.field1] += +v.value1;
    p[v.field2] += +v.value2;
    return p;
}
reduceRemove: (p, v) => {
    p[v.field1] -= +v.value1;
    p[v.field2] -= +v.value2;
    return p;
}
reduceInitial: () => {
    return {}
}

And when you use the group in a chart, you just change the valueAccessor to be (d) => d.value[d.key] instead of the usual (d) => d.value

Small inefficicency as you store more data than you need to in the value fields but if you don't have millions of distinct values it's basically negligible.

Chapo
  • 2,563
  • 3
  • 30
  • 60
0

you always have a good way to re-arrange the data, after you have fetched it and before you feed it to crossfilter ;)

In fact, it's pretty much mandatory as soon as you handle non string fields (numeric or date)

You can do a reduceSum on multiple fields

dimensions.reduceSum(function(d) {return +d.value1 + +d.value2; });
Xavier
  • 1,157
  • 9
  • 29
  • But that doesn't solve my problem does it ? You're just summing both values. – Chapo Jul 01 '19 at 06:30
  • In fact, it's pretty much mandatory as soon as you handle non string fields (numeric or date) -> don't agree. If your server is properly setup, you can send back the right type w/o post-processing. It's a good thing btw if you're handling millions of lines you don't want to handle the conversion client-side. – Chapo Jul 01 '19 at 06:32
  • Well, it's a limitation of the CSV format that the fields are untyped and thus strings. I think type inference is coming in a newer `d3.csv`. JSON does allow you to express a few types natively. – Gordon Jul 01 '19 at 09:49
  • 1
    ah, so you want to return d.value1 if dimension is field1? I misunderstood your question, the reducer doesn't know the key, so I don't see a solution right now ;( – Xavier Jul 01 '19 at 15:37