how to select submatrix in a (adjacency) matrix based on ties, in R

Question

I have a matrix, which represents mobility between various jobs:

 jobdat <- matrix(c(
           295,  20,   0,    0,    0,    5,    7,
           45,   3309, 15,   0,    0,    0,    3,
           23,   221,  2029, 5,    0,    0,    0,
           0,    0,    10,   100,  8,    0,    3,
           0,    0,    0,    0,    109,  4,    4,
           0,    0,    0,    0,    4,    375,  38,
           0,    18,   0,    0,    4,    26,   260), 
           nrow = 7, ncol = 7, byrow = TRUE,
           dimnames = list(c("job 1","job 2","job 3","job 4","job 5","job 6","job 7"),
                c("job 1","job 2","job 3","job 4","job 5","job 6","job 7")))

This is treated as a directed, weighted adjacency matrix in a social network analysis. The direction of the network is from rows to columns: So mobility is defined as going from a job-row to a job-column. The diagonal is meaningful, since it is possible to change to the same job in another firm.

For part of my analysis I want to select a submatrix which consists of job 1, job 5 and job 7:

work.list <- c(1,5,7)
jobpick_wrong <- jobdat[work.list,work.list]

however, this only gives the direct ties between these three jobs. What I need is this:

jobpick_right <- matrix(c(
          295,  20,   0,    5,    7,
          45,   3309, 0,    0,    3,
          0,    0,    109,  4,    4,
          0,    0,    4,    375,  38,
          0,    18,   4,    26,   260),
          nrow = 5, ncol = 5, byrow = TRUE,
          dimnames = list(c("job 1","job 2","job 5","job 6","job 7"),
                    c("job 1","job 2","job 5","job 6","job 7")))

Here, job 2 and 6 are also included, since these two jobs also have direct ties to either job 1, 5 or 7. While job 3 and 4 are excluded, because they do not have any ties to job 1, 5 or 7.

I'm not sure how to go about this. Maybe I have to transform it into an igraph-object in order to get anywhere?

net           <- graph.adjacency(jobdat, mode = "directed", weighted = TRUE)

and then maybe use the ego/neighborhood-function, also from the igraph package? But how I'm really not sure how. Or if this is the best way to go about it.

Thank you for your time,

Emil Begtrup-Bright

Augmented question:

The answer by aichao is perfect for the question asked, although it turns out that another step is needed. When the work.list has been created that include the jobs that has ties to the three "jobs of interest", job 1, 5, 7 in this example. Then, with real data, the amount of clutter makes another step desirable: That only the direct ties to and from the three jobs of interest are kept, while ties between other jobs are set to zero.

The data above does not depict this in a very good way, so I have created a very version of the above to demonstrate this:

jobdat <- matrix(c(
1,   0,   1,   0,   0,   0,   0,
1,   1,   1,   0,   0,   0,   0,
1,   1,   1,   0,   0,   0,   0,
0,   0,   0,   1,   0,   0,   0,
0,   0,   0,   0,   1,   0,   0,
0,   0,   0,   0,   0,   1,   0,
0,   0,   0,   0,   0,   0,   1
           ), 
           nrow = 7, ncol = 7, byrow = TRUE,
           dimnames = list(c("job 1","job 2","job 3","job 4","job 5","job 6","job 7"),
                c("job 1","job 2","job 3","job 4","job 5","job 6","job 7")))

by using aichaos solution:

work.list <- sort(unique(unlist(lapply(work.list, function(x) which(jobdat[x,] != 0)))))

then we get this:

jobdat[work.list,work.list]
#          job 1 job 2 job 3 job 5 job 7
#    job 1     1     0     1     0     0
#    job 2     1     1     1     0     0
#    job 3     1     1     1     0     0
#    job 5     0     0     0     1     0
#    job 7     0     0     0     0     1

However, the ties between job 2 and job 3 are irrelevant, and only serves to obscure the ties of interest.

jobdat.result <- matrix(c(
1,     0,     1,     0,     0,
1,     1,     0,     0,     0,
1,     0,     1,     0,     0,
0,     0,     0,     1,     0,
0,     0,     0,     0,     1
           ), 
           nrow = 5, ncol = 5, byrow = TRUE,
           dimnames = list(c("job 1","job 2","job 3","job 5","job 7"),
                c("job 1","job 2","job 3","job 5","job 7")))

in job.dat.result, the tie between job 3 and job 2 have been removed, both row-wise and col-wise, but the ties between these two jobs and the three jobs of interest are kept. Ideally, it should be possible to choose wether the diagonal of job 2 and job 3 should also be zero. But most likely, I'll set the diagonal to zero, for all jobs, so this is not required. But would be nice, if nothing else then for me to understand the logic of this at a higher level.

What I am trying to achieve, among other things, is circlegrams like this:

So simplicity in the number of ties is important. The diagram is reproduced like this:

library(circlize)
segmentcircle <- jobdat  
diag(segmentcircle) <- 0
df.c <- get.data.frame(graph.adjacency(segmentcircle,weighted=TRUE))
colour <-  brewer.pal(ncol(segmentcircle),"Set1")
chordDiagram(x = df.c, 
  grid.col = colour, 
  transparency = 0.2,
             directional = 1, symmetric=FALSE,
             direction.type = c("arrows", "diffHeight"), diffHeight  = -0.065,
             link.arr.type = "big.arrow", 
             # self.link=1
             link.sort = TRUE, link.largest.ontop = TRUE,
             link.border="black",
             # link.lwd = 2, 
             # link.lty = 2
             )

Which way is your direction? Is it from rows to columns or columns to rows? — aichao, Nov 02 '16 at 13:13
oh yeah, right: from rows to columns: So mobility is defined as going from a job-row to a jobcolumn. The diagonal is meaningful, since it is possible to change to the same job in another firm. — emilBeBri, Nov 02 '16 at 13:21
`idx <- colSums(jobdat[work.list,] > 0)!=0;jobdat[idx, idx]` — Khashaa, Nov 02 '16 at 13:25

score 2 · Accepted Answer · edited Jun 20 '20 at 09:12

Assuming your directed graph is from rows to columns, what you can do is to augment your work.list with those columns that are connected (with element !=0) to each row in the work.list. You can do this by:

work.list <- sort(unique(unlist(lapply(work.list, function(x) which(jobdat[x,] != 0)))))

Use unique to keep only the unique columns assembled and sort so that these columns are sorted by their indices. Then:

jobdat[work.list,work.list]
##      job 1 job 2 job 5 job 6 job 7
##job 1   295    20     0     5     7
##job 2    45  3309     0     0     3
##job 5     0     0   109     4     4
##job 6     0     0     4   375    38
##job 7     0    18     4    26   260

If instead, your directed graph is from columns to rows:

work.list <- sort(unique(unlist(lapply(work.list, function(x) which(jobdat[,x] != 0)))))

Updated for augmented question:

With the new jobdat:

jobdat <- matrix(c(
  1,   0,   1,   0,   0,   0,   0,
  1,   1,   1,   0,   0,   0,   0,
  1,   1,   1,   0,   0,   0,   0,
  0,   0,   0,   1,   0,   0,   0,
  0,   0,   0,   0,   1,   0,   0,
  0,   0,   0,   0,   0,   1,   0,
  0,   0,   0,   0,   0,   0,   1
), 
nrow = 7, ncol = 7, byrow = TRUE,
dimnames = list(c("job 1","job 2","job 3","job 4","job 5","job 6","job 7"),
                c("job 1","job 2","job 3","job 4","job 5","job 6","job 7")))

and the list of relevant jobs in work.list:

work.list <- c(1,5,7)

Compute the augmented work list aug.work.list as the collection of jobs that goes directly to the relevant jobs in the work.list. This will include jobs 2 and 3. Note that we use which(jobdat[,x] != 0) instead of which(jobdat[x,] != 0) here to identify the job (either relevant or irrelevant) that connects to the relevant job x in the work.list.

aug.work.list <- sort(unique(unlist(lapply(work.list, function(x) which(jobdat[,x] != 0)))))
##[1] 1 2 3 5 7

This results in:

jobdat.result <- jobdat[aug.work.list, aug.work.list]
##      job 1 job 2 job 3 job 5 job 7
##job 1     1     0     1     0     0
##job 2     1     1     1     0     0
##job 3     1     1     1     0     0
##job 5     0     0     0     1     0
##job 7     0     0     0     0     1

Now, to remove the connections between irrelevant jobs, first find the indices for these irrelevant jobs in jobdat.result, which are indices of elements in aug.work.list that are not in work.list

irrelevant.job.indices <- which(!(aug.work.list %in% work.list))
##[1] 2 3

Note that these are not job numbers for the irrelevant jobs but the (row and column) indices in jobdat.result corresponding to the irrelevant job numbers. In this case, they just happen to correspond to the job numbers themselves.

Removing the connections require setting the off-diagonals for the sub-matrix of jobdat.result indexed by irrelevant.job.indices to 0. To do this:

## first, keep diagonal values for irrelevant.job.indices
dvals <- diag(jobdat.result)[irrelevant.job.indices]
## set sub-matrix to zero (this will also set diagnal elements to zero)
jobdat.result[irrelevant.job.indices,irrelevant.job.indices] <- 0
## replace diagonal elements
diag(jobdat.result)[irrelevant.job.indices] <- dvals

The result is:

jobdat.result
##      job 1 job 2 job 3 job 5 job 7
##job 1     1     0     1     0     0
##job 2     1     1     0     0     0
##job 3     1     0     1     0     0
##job 5     0     0     0     1     0
##job 7     0     0     0     0     1

A related question, that maybe I could augment my question with if you know the answer,: How would you go about doing what you do there, but after that, then keeping only mobility from and to the three relevant jobs: 1, 5, 7? My example doesn't contain this, but suppose the irrelevant job 2 had ties to the likewise irrelevant job 6. And that is considered "noise", how to set these irrelevant ties to zero? This might seem a bit beyond the question, but as I see it, is it actually just making the answer more complete. I'll modify my answer accordingly, if you can answer this. thank you — emilBeBri, Nov 08 '16 at 14:13
@EmilBB: I think what you want to find are the indices to the augmented `work.list` that are not in the original `work.list`; then use these to set those off-diagonal elements of the current result to zero. To find those do: `set.to.zero <- which(!(aug.work.list %in% work.list))` where now `aug.work.list <- sort(unique(unlist(lapply(work.list, function(x) which(jobdat[x,] != 0)))))` — aichao, Nov 08 '16 at 14:38
@EmilBB: Did the above address your augmented question? If so, I can edit my answer above to reflect this addition. If not, please clarify your question by editing the question itself. — aichao, Nov 08 '16 at 20:32
hmm I've clarified my question instead, to make it easier for others to follow. Also added a simpler example to make this "augmented" task easier to understand for others. — emilBeBri, Nov 09 '16 at 09:29
Hi again aichao, I have augmented my question, your suggestion doesn't seem to be working the way it should. Would appreciate it if you had the time to look at it. — emilBeBri, Nov 25 '16 at 10:28
@EmilBB: in your augmented question, if we apply the original solution, `job2` should **not** be in the resulting `work.list` because in `jobdat` column `2` is not `1` for rows `1`, `5`, `7` in the original `work.list`. I can give you an updated answer for the case where `jobdat[1,2]=1` that will remove the ties between `job2` and `job3` but will have a tie from `job1` to `job2` if you'd like. — aichao, Nov 28 '16 at 01:55
Yes, that is exactly it! Only the ties from the "irrelevant" jobs that goes directly to the relevant jobs should be kept, like from job2 to job1, as you say. Would appreciate it, have been trying myself of course but as of yet it's just too high level for me. — emilBeBri, Nov 29 '16 at 10:25
thanks a lot, the helpfulness of this place blasts me away sometimes. You have a great day, you deserve it. — emilBeBri, Dec 01 '16 at 10:41

how to select submatrix in a (adjacency) matrix based on ties, in R

Augmented question:

1 Answers1

Updated for augmented question: