0

this is a follow-up question to a recent issue on calculating graph depth I encountered. This involves tidyverse and tidygraph. After reading into tidygraph I felt I'd give it a proper try but I encountered a new problem in my workflow.

When working with the group_by() verb from dplyr to create a graph for each group, the guess_df_type() function in as_tbl_graph() from tidygraph does not what I'm looking for but I can't find a way to set the from and to value as intended. Here's a reproducible example:

library(tidygraph)
library(tidyverse)

tmp <- tibble(
  id_head = as.integer(c(4,4,4,4,4,4,5,5,5,5)),
  id_sec  = as.integer(c(1,1,1,2,2,2,1,1,2,2)),
  token   = as.integer(c(1,2,3,1,2,3,1,2,1,2)),
  head    = as.integer(c(2,2,2,1,1,2,2,2,2,2)),
  root    = as.integer(c(2,2,2,1,1,1,2,2,2,2))
) 
tmp %>%
  group_by(id_head, id_sec) %>% 
  as_tbl_graph()

The result to this is:

# A tbl_graph: 4 nodes and 10 edges
#
# An undirected multigraph with 1 component
#
# Node Data: 4 x 1 (active)
   name
  <chr>
1     4
2     5
3     1
4     2
#
# Edge Data: 10 x 5
   from    to token  head  root
  <int> <int> <dbl> <dbl> <dbl>
1     1     3     1     2     2
2     1     3     2     2     2
3     1     3     3     2     2
# ... with 7 more rows

The nodes are not taken from the token column but from both id_head and id_sec.

After looking further into it I renamed token and head to from and to and this at least solves the first issue:

tmp %>% 
  rename(
    from = token,
    to = head
  ) %>% 
  as_tbl_graph(directed = FALSE) 

Resulting:

# A tbl_graph: 3 nodes and 10 edges
#
# An undirected multigraph with 1 component
#
# Node Data: 3 x 1 (active)
   name
  <chr>
1     1
2     2
3     3
#
# Edge Data: 10 x 5
   from    to id_head id_sec  root
  <int> <int>   <int>  <int> <int>
1     1     2       4      1     2
2     2     2       4      1     2
3     2     3       4      1     2
# ... with 7 more rows

Let me further formulate the issue I'm having. When I try to use group_by(id_head,id_sec) inside the graph, the result is an error:

tmp %>% 
  as_tbl_graph() %>%
  group_by(id_head, id_sec)

Error in grouped_df_impl(data, unname(vars), drop) :

Column id_head is unknown

So either way, I do not understand how to use group_by with tidygraph. Any help is very much appreciated! Thanks in advance.

Also, sorry for using igraph as a tag, it should be tidygraph but that does not exist yet. tidygraph is build upon igraph and the tidyverse tho.

emilliman5
  • 5,816
  • 3
  • 27
  • 37
Paavo Pohndorff
  • 323
  • 1
  • 2
  • 17

1 Answers1

2

For the first question I’m a bit unsure how your data.frame should be parsed into a graph - tidygraph contains documentation about all the graph representations it understands and I suggest you consult this.

For the second question - it is simply a matter of nodes being active while the edges contains the variable you want to group on. Simply activate the edges prior to grouping...

tmp %>% 
  rename(
    from = token,
    to = head
  ) %>%
  as_tbl_graph() %>%
  activate(edges) %>%
  group_by(id_head, id_sec)
Paavo Pohndorff
  • 323
  • 1
  • 2
  • 17
ThomasP85
  • 1,624
  • 2
  • 15
  • 26
  • Regarding the first part: The data is a mockup of text data and both `token` and `head` represent the relationships of tokens towards each other in sentences (`id_sub`) within messages (`id_head`) as dependency trees. – Paavo Pohndorff Jan 03 '18 at 14:42
  • So its sort of a linked list? – ThomasP85 Jan 03 '18 at 14:45
  • You could probably work on it similar to a linked list but I'm not sure if that helps understanding it. [Here](https://en.wikipedia.org/wiki/Dependency_grammar#Representing_dependencies)'s a better example how dependency trees in NLP may look like. In my case all tokens refer over at least one edge to the root token. – Paavo Pohndorff Jan 03 '18 at 15:05
  • 1
    There is no build in support for this graph representation. If you pass in a single data.frame to `as_tbl_graph` it will expect it to be an edgelist, but yours is a nodelist with parent/child recorded as variables. I would suggest you extract the edge information into a separate data frame and just do `tbl_graph(nodes, edges)` – ThomasP85 Jan 04 '18 at 09:41