8

I'm trying to tally up the hours from a data tree structure. I can add up the hours directly under parent node, but I can't include the hours assigned to the parent nodes in the tree. Any suggestions would be great.

This is what I am getting:

levelName hours totalhours 1 Ned NA 1 2 °--John 1 3 3 °--Kate 1 3 4 ¦--Dan 1 1 5 ¦--Ron 1 1 6 °--Sienna 1 1

This is what I'm looking for:

levelName hours totalHours 1 Ned NA 5 2 °--John 1 5 3 °--Kate 1 4 4 ¦--Dan 1 1 5 ¦--Ron 1 1 6 °--Sienna 1 1

Here's my code:

# Install package
install.packages('data.tree')
library(data.tree)

# Create data frame
to <- c("Ned", "John", "Kate", "Kate", "Kate")
from <- c("John", "Kate", "Dan", "Ron", "Sienna")
hours <- c(1,1,1,1,1)
df <- data.frame(from,to,hours)

# Create data tree
tree <- FromDataFrameNetwork(df)
print(tree, "hours")

# Get running total of hours that includes all nodes and children values.
tree$Do(function(x) x$total <- Aggregate(x, "hours", sum), traversal = "post-order")
print(tree, "hours", runningtotal = tree$Get(Aggregate, "total", sum))
Bridgbro
  • 269
  • 1
  • 3
  • 17

3 Answers3

9

You could simply use a recursive function:

myApply <- function(node) {
  node$totalHours <- 
    sum(c(node$hours, purrr::map_dbl(node$children, myApply)), na.rm = TRUE)
}
myApply(tree)
print(tree, "hours", "totalHours")

Result:

           levelName hours totalHours
1 Ned                   NA          5
2  °--John               1          5
3      °--Kate           1          4
4          ¦--Dan        1          1
5          ¦--Ron        1          1
6          °--Sienna     1          1

Edit: Filling two elements:

# Create data frame
to <- c("Ned", "John", "Kate", "Kate", "Kate")
from <- c("John", "Kate", "Dan", "Ron", "Sienna")
hours <- c(1,1,1,1,1)
hours2 <- 5:1
df <- data.frame(from,to,hours, hours2)

# Create data tree
tree <- FromDataFrameNetwork(df)
print(tree, "hours", "hours2")

myApply <- function(node) {
  res.ch <- purrr::map(node$children, myApply)
  a <- node$totalHours <- 
    sum(c(node$hours,  purrr::map_dbl(res.ch, 1)), na.rm = TRUE)
  b <- node$totalHours2 <- 
    sum(c(node$hours2, purrr::map_dbl(res.ch, 2)), na.rm = TRUE)
  list(a, b)
}
myApply(tree)
print(tree, "hours", "totalHours", "hours2", "totalHours2")

Result:

           levelName hours totalHours hours2 totalHours2
1 Ned                   NA          5     NA          15
2  °--John               1          5      5          15
3      °--Kate           1          4      4          10
4          ¦--Dan        1          1      3           3
5          ¦--Ron        1          1      2           2
6          °--Sienna     1          1      1           1
F. Privé
  • 11,423
  • 2
  • 27
  • 78
  • That's very cool (and more generic). I have one question. If we had more than one column with numeric data, and would like to create corresponding columns with aggregated data, would we have to create an "apply" function for each column (that's what I did), or can all the columns be created using only one recursive function (I didn't succeed in that)? – Brani Nov 02 '17 at 07:36
  • @Brani I think you could fill many variables in the function and return a list with all of them and maybe make use of `map2` or `pmap` instead of `map`. Have you an example in mind? – F. Privé Nov 02 '17 at 10:20
  • It suffices to add another variable in df (like the 'hours', but with different numbers) and use the same example. – Brani Nov 02 '17 at 11:35
5

The Aggregate value caching during Do seems to only work for the same field:

tree$Do(function(node) node$totalHours = node$hours)

tree$Do(function(node) node$totalHours = sum(if(!node$isLeaf) node$totalHours else 0,
                                             Aggregate(node, "totalHours", sum)),
        traversal = "post-order")
print(tree, "hours", "totalHours")
#           levelName hours totalHours
#1 Ned                   NA          5
#2  °--John               1          5
#3      °--Kate           1          4
#4          ¦--Dan        1          1
#5          ¦--Ron        1          1
#6          °--Sienna     1          1
eddi
  • 49,088
  • 6
  • 104
  • 155
3

The Aggregate function of the data.tree package is especially useful if you want to recursively sum up children. In your case, there are two things you want to do:

  1. Sum up children plus own value
  2. Store sum in separate variable

A way to do this is:

library(data.tree)

# Create data frame
to <- c("Ned", "John", "Kate", "Kate", "Kate")
from <- c("John", "Kate", "Dan", "Ron", "Sienna")
hours <- c(1,1,1,1,1)
df <- data.frame(from,to,hours)

# Create data tree
tree <- FromDataFrameNetwork(df)
print(tree, "hours")

# Get running total of hours that includes all nodes and children values.
tree$Do(function(x) x$total <- ifelse(is.null(x$hours), 0, x$hours) + sum(Get(x$children, "total")), traversal = "post-order")
print(tree, "hours", "total")
Christoph Glur
  • 1,224
  • 6
  • 10