2

I have the following string derived from a Bayesian Network learning algorithm (like from bnlearn or deal packages):

[1] "[wst|af:bq:rloss_s:pre3][af|bq][d|wst:af:con:rloss_s][bq|con][con|af][rloss_s|af:con:pre3][pre3|af:con]"

The string defines the connection between variables and the direction of the connection. The first variable of each term in brackets ([...]) represents a node and all variables behind | represent the nodes that are connected in direction to the first node. These variables are sperated by :.

I would like to transform the string into a data.frame that represents the connection between each variable. It should look like this:

> data.frame(string_table)
      from      to
1       af     wst
2       bq     wst
3  rloss_s     wst
4     pre3     wst
5       bq      af
6      wst       d
7       af       d
8      con       d
9  rloss_s       d
10     con      bq
11      af     con
12      af rloss_s
13     con rloss_s
14    pre3 rloss_s
15      af    pre3
16     con    pre3
viktor_r
  • 701
  • 1
  • 10
  • 21

2 Answers2

4

I would use the graph tools here rather than string manipulation. Here is an example to illustrate

library(bnlearn)

d = clgaussian.test
m = hc(d)

So you have the string / model

bnlearn::modelstring(m)
#[1] "[A][B][C][H][D|A:H][F|B:C][E|B:D][G|A:D:E:F]"

using bnlearn loop through to get the parents of each node

stack(sapply(nodes(m), function(x) parents(m, x)))

or use igraph on the adjacency matrix to get the edge list

library(igraph)
as_edgelist(graph_from_adjacency_matrix(amat(m)))

EDIT:

Seems bnlearn has a function to extract the edges

arcs(m)
user2957945
  • 2,353
  • 2
  • 21
  • 40
3

You can do this in two steps. First, use regular expressions (such as the str_match_all function in the stringr package) to extract a matrix of pairs:

s <- "[wst|af:bq:rloss_s:pre3][af|bq][d|wst:af:con:rloss_s][bq|con][con|af][rloss_s|af:con:pre3][pre3|af:con]"

library(stringr)
m <- str_match_all(s, "\\[(.*?)\\|(.*?)\\]")[[1]]
m

This results in this matrix, of which the third and second columns have what we're interested in:

     [,1]                       [,2]      [,3]                
[1,] "[wst|af:bq:rloss_s:pre3]" "wst"     "af:bq:rloss_s:pre3"
[2,] "[af|bq]"                  "af"      "bq"                
[3,] "[d|wst:af:con:rloss_s]"   "d"       "wst:af:con:rloss_s"
[4,] "[bq|con]"                 "bq"      "con"               
[5,] "[con|af]"                 "con"     "af"                
[6,] "[rloss_s|af:con:pre3]"    "rloss_s" "af:con:pre3"       
[7,] "[pre3|af:con]"            "pre3"    "af:con"            

Then, add them to a data frame, split the "from" values around colons, and use tidyr's unnest() to create one row per from-to pair.

library(tidyr)
df <- data.frame(from = m[, 3], to = m[, 2])
string_table <- unnest(df, from = str_split(from, ":"))
David Robinson
  • 77,383
  • 16
  • 167
  • 187