R dplyr resolve variable in conditional filter

Question

I am trying to filter based on a variable value, and have tried multiple combinations of filter_, dots and quotes to no avail.

As an example, I have a

runlist = c(1, 2, 3, 4, 5)

and a dataframe boo

run <- rep(seq(5), 3)
edge1 <- sample(20, 15)
edge2 <- sample(20, 15)
weights <- sample(50, 15)
boo <- as.data.frame(cbind(run, edge1, edge2, weights))

and I want to filter a dataframe named boo which may look something like iteratively as

for (i in runlist) {
    bop <- boo %>% filter( run == i )
    str(boo)
}

I suspect I'll be hearing about not using for loops and R, rather use group_by(run), but I'm sending this data to igraph and need to further subset the dataset to just edges and weights, thus losing the grouping variable, as in

bop <- boo %>% filter( run == i ) %>% select( edge1, edge2, weights )

I will create a network graph and find density and centrality values for each run.

bing <- graph.data.frame(bop)

How do I get the i in the conditional filter to resolve as the correct index?

@Nate Day Wow, yes, that works. Could you explain to me why your suggestion works while `runlist = unique(boo$run)` and indexing on `runlist` does not? And how do I correctly @ your handle which has a space? — zazizoma, May 18 '17 at 04:07

Heisenberg · Accepted Answer · 2017-05-18T16:42:53.970

2

My answer is not about "resolving a variable in a conditional filter", but there's a much easier way to do what you want to do.

The big idea is to split the data frame based on the variable run, and map a function onto each of those pieces. This function takes a piece of the data frame and spits out an igraph.

The following code accomplishes the above, storing a list of graphs in the column graph. (It's a list-column, see more at the R for data science book)

boo %>%
  group_by(run) %>%
  nest() %>%
  mutate(graph = map(data, function(x) graph.data.frame(x %>% select(edge1, edge2, weights)))) %>%
  mutate(density = map(graph, function(x) graph.density(x))

edited May 18 '17 at 16:42

answered May 17 '17 at 23:07

Heisenberg

8,386
12
53
102

intruiguing . . . I'll try it out and see if I can get the centrality and density stats I want. I like the was you inserted the selection into the graph call. – zazizoma May 17 '17 at 23:10
So I've got the list-column of graphs, but I haven't been successful in using the graphs individually to obtain densities and centralities stats. I've even mirrored your suggestion and tried `PDensities <- PGraphs %>% mutate(PDensity = map(data, function(x) graph.density(x)))` but recieve Not a graph object error messages. Thanks also for sending the docs link, very interesting, bu it recommends broom which doesn't appear to tidy network graphs. How would I obtain, for example, graph.density for each of the graphs? I'd love to use this method. – zazizoma May 18 '17 at 03:12
Each graph is stored as a cell in the data frame. You could extract the graph iin say, the 2nd row and 3rd column with `my_dataframe[2, 3]` just like as usual. There's probably a more efficient mapping operation, which maps an extract function to each cell of the list column. But the fundamental idea is that you can extract things from the list column just as any column. – Heisenberg May 18 '17 at 14:57
I added an operation to the pipeline that takes in the graphs and spits out the density. – Heisenberg May 18 '17 at 16:43
That works, and is fast. Thanks so much, I'm using this. – zazizoma May 18 '17 at 18:22

R dplyr resolve variable in conditional filter

1 Answers1