1

I have a ragged data frame with each row as an occurrence in time of one or more entities, like so:

(time1) entitya entityf entityz
(time2) entityg entityh
(time3) entityo entityp entityk entityL
(time4) entityM

I want to create an edge list for network analysis from a subset of entities found in a second vector (nodelist). My problem is that I don't know:

1). How to subset only the entities in the nodelist. I was considering

datanew<- subset(dataold, dataold %in% nodelist)

but it doesn't work.

2). How to make ragged data frame into a two column edge list. In the above example, it would transform to:

entitya entityf
entitya entityz
entityz entityf
...

NO idea how to do this. Any help is really appreciated!

PearsonArtPhoto
  • 38,970
  • 17
  • 111
  • 142
Olga Mu
  • 908
  • 2
  • 12
  • 23
  • In what form does the "ragged data frame" come? Is it an object in R (if so, what class and can you provide it to us via `dput`?) or is it just a text file at this point? – flodel Dec 09 '12 at 00:20
  • It's from a column in a csv file that I imported and then split via strsplt and apply. So it's a list that I can make into a vector. – Olga Mu Dec 09 '12 at 05:35

1 Answers1

1

Try this:

# read your data 

dat <- strsplit(readLines(textConnection("(time1) entitya entityf entityz
(time2) entityg entityh
(time3) entityo entityp entityk entityL
(time4) entityM")), " ")

# remove (time)

dat <- lapply(dat, `[`, -1)

# filter

nodelist <- c("entitya", "entityf", "entityz", "entityg", "entityh",
              "entityo", "entityp", "entityk")

dat <- lapply(dat, intersect, nodelist)

# create an edge matrix

t(do.call(cbind, lapply(dat[sapply(dat, length) >= 2], combn, 2)))

This last step might be a lot to digest, so here is a breakout:

  • sapply(dat, length) computes the lengths of your list elements
  • dat[... >= 2] only keeps the list elements with at least two items
  • lapply(..., combn, 2) creates all combinations: a list of wide matrices
  • do.call(cbind, ...) binds all the combinations into a wide matrix
  • t(...) transposes into a tall matrix
flodel
  • 87,577
  • 21
  • 185
  • 223
  • Thank you so much! This is really helpful, but I should have specified: I need the combinations to be only those found in the original matrix. The current answer seems to give me all combinations of unique elements in the data. Any further hint? – Olga Mu Dec 15 '12 at 21:43
  • What do you mean by "the original matrix" and is it tied to what you called the "nodelist" in your question? That part of your question was unclear. See that in my code I create an object called `nodelist` which is a subset of all the entities found in your original data. You are free to make that `nodelist` whatever you like to subset your data further. – flodel Dec 15 '12 at 21:52