I have a large data frame containing transaction data. Each transaction refers to a person contributing to an artifact (e.g. a developer modified a file).
I strive to convert this data into a bipartite networkDynamic
graph, where persons and artifacts are the nodes, and transactions are represented as edges, that are active only at the point in time of the transaction. Obviously, between two nodes there can be multiple transactions, which means we have one edge activated at multiple times.
So far so good. At the end of the day I need to compute statistics about the evolution of this network, e.g. measuring statistics like the network's connectedness at multiple points in time.
For some reason, I am constantly running into problems. For example, in the reproducible code example below, the last but one call of the function tSnaStats
complains about multiple attribute values. This generally should be fine, as we have an edge that is active more than once within the query spell. However, it is strange that it uses the earliest value although I specified the rule as latest. The last call (using gtrans
) even fails with an error.
So my first question is whether or not my network construction code is valid (maybe I have misunderstood something). If it is, the question is if this is a bug in the tsna package...
if (!require("pacman")) install.packages("pacman")
library("pacman")
pacman::p_load(network, networkDynamic, tsna)
dfTransactions <-
structure(
list(
weight = c(1, 2, 2),
contributorweight = c(1, 2, 2),
artifactweight = c(1, 2, 2),
contributorId = c("u1", "u1", "u2"),
instantId =
c(1000, 2000, 3000) ,
artifactId = c("a1", "a1", "a2")
),
.Names = c(
"weight",
"contributorweight",
"artifactweight",
"contributorId",
"instantId",
"artifactId"
),
row.names = c(1L, 2L, 3L),
class = "data.frame"
)
dfEdges <- unique(dfTransactions[, c("contributorId", "artifactId")])
veUniqueContributors <- unique(dfEdges[[1]])
veUniqueArtifacts <- unique(dfEdges[[2]])
veUniqueVertices <- c(veUniqueContributors, veUniqueArtifacts)
nuNrUniqueContributors <- length(veUniqueContributors)
nuNrUniqueArtifacts <- length(veUniqueArtifacts)
nuNrUniqueVertices <- length(veUniqueVertices)
dfEdgeSpells <-
dfTransactions[c("instantId",
"instantId",
"contributorId",
"artifactId",
"weight")]
dfContributorSpells <-
dfTransactions[c("instantId",
"instantId",
"contributorId",
"contributorweight")]
dfArtifactSpells <-
dfTransactions[c("instantId", "instantId", "artifactId", "artifactweight")]
names(dfContributorSpells) <-
c("onset", "terminus", "vertex.id", "weight")
names(dfArtifactSpells) <-
c("onset", "terminus", "vertex.id", "weight")
dfVertexSpells <- data.frame(
onset = numeric(),
terminus = numeric(),
vertex.id = character(),
weight = numeric()
)
dfVertexSpells <- rbind(dfContributorSpells, dfArtifactSpells)
# Convert vertex names to vertex ids
dfEdgeSpells[["contributorId"]] <-
match(dfEdgeSpells[["contributorId"]], veUniqueVertices)
dfEdgeSpells[["artifactId"]] <-
match(dfEdgeSpells[["artifactId"]], veUniqueVertices)
dfVertexSpells$vertex.id <-
match(dfVertexSpells$vertex.id, veUniqueVertices)
net <- network.initialize(
nuNrUniqueVertices,
directed = TRUE,
hyper = FALSE,
loops = FALSE,
multiple = FALSE,
bipartite = nuNrUniqueContributors
)
net %v% "vertex.names" <- veUniqueVertices
net %v% "vertex.type" <-
c(rep("contributor", nuNrUniqueContributors),
rep("artifact", nuNrUniqueArtifacts))
net <- networkDynamic(
net,
create.TEAs = TRUE,
edge.spells = dfEdgeSpells,
edge.TEA.names = c("weight"),
vertex.spells = dfVertexSpells,
vertex.TEA.names = c("weight")
)
reconcile.vertex.activity(net = net,
mode = "encompass.edges",
edge.active.default = TRUE)
# Returns Warning
tSnaStats(
net,
"connectedness",
time.interval = 1001,
aggregate.dur = 1001,
rule = "latest"
)
# Returns Error
tSnaStats(
net,
"gtrans",
time.interval = 100,
aggregate.dur = 100,
rule = "latest"
)