0

So I have a table that that tells the freq (N) of two variables (V1 and V2) appearing together. Here is a sample:

> dput(ans)
structure(list(V1 = c(2L, 7L, 7L, 7L, 7L, 7L, 9L, 9L, 9L, 10L, 
10L, 11L, 12L, 12L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 14L, 14L, 
14L, 14L, 15L, 15L, 15L, 16L, 16L, 16L, 16L, 17L, 17L, 17L, 20L, 
20L, 21L, 25L, 29L, 29L, 29L, 33L, 35L, 38L, 42L, 46L, 46L, 46L, 
46L, 46L, 46L, 46L, 46L, 46L, 46L, 46L, 46L, 47L, 47L, 48L, 52L, 
52L, 52L, 52L, 52L, 56L, 56L, 56L, 56L, 56L, 56L, 56L, 57L, 57L, 
57L, 57L, 57L, 57L, 58L, 58L, 58L, 58L, 58L, 59L, 59L, 59L, 59L, 
60L, 60L, 60L, 61L, 61L, 62L, 65L, 65L, 65L, 65L, 67L, 67L, 67L, 
68L, 70L, 70L, 71L, 73L, 73L, 74L), V2 = c(3L, 8L, 20L, 21L, 
22L, 78L, 10L, 11L, 12L, 11L, 12L, 12L, 38L, 39L, 14L, 15L, 16L, 
17L, 18L, 29L, 64L, 15L, 16L, 17L, 18L, 16L, 17L, 18L, 17L, 18L, 
29L, 30L, 18L, 29L, 30L, 21L, 22L, 22L, 26L, 30L, 47L, 64L, 34L, 
36L, 39L, 43L, 47L, 48L, 49L, 52L, 65L, 67L, 70L, 71L, 72L, 73L, 
74L, 75L, 48L, 49L, 49L, 65L, 67L, 73L, 74L, 75L, 57L, 58L, 59L, 
60L, 61L, 62L, 63L, 58L, 59L, 60L, 61L, 62L, 63L, 59L, 60L, 61L, 
62L, 63L, 60L, 61L, 62L, 63L, 61L, 62L, 63L, 62L, 63L, 63L, 67L, 
73L, 74L, 75L, 73L, 74L, 75L, 69L, 71L, 72L, 72L, 74L, 75L, 75L
), N = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L)),
 row.names = c(NA, -108L), class = c("data.table", "data.frame"))

I want to convert it to a 696x696 matrix where I have V1 and V2 as the rows and columns (from 1-696 in both rows and columns), and N as the values. V1 and V2 represents materials in my dataset. If a V1 and V2 combination does not exist in the table, the value should be 0. This is because I want to cluster the materials based on their freq of appearing together, using the hclust with centroid function.

EDIT: Only way I can give an example of the expected output is a picture from an article i'm following: enter image description here

J. Win.
  • 6,662
  • 7
  • 34
  • 52
M.A
  • 65
  • 1
  • 8
  • I fail to get the output you want, could you provide a mock-up example? Also, please use `dput()` to provide a data example. – LAP May 24 '18 at 09:33
  • @LAP better now? – M.A May 24 '18 at 09:43
  • Hm, I'm getting an error `unexpected '>' in: row.names = c(NA, -108L), class = c("data.table", "data.frame"), .internal.selfref = <"`. But that may just be me. Still, more important to me would be an example of your expected output. – LAP May 24 '18 at 09:46
  • Sorry, I don't know how to explain it. For me, the objective is to cluster the materials based on their freq the appear together, or even create a distance matrix and cluster from that. To create the table, I have followed the answer from this: https://stackoverflow.com/questions/50482870/count-occurrences-of-a-variable-having-two-given-values-corresponding-to-one-val/50483840?noredirect=1#comment88011802_50483840 – M.A May 24 '18 at 09:56

2 Answers2

0

That's a common task for rasters... using the raster package and converting it back to matrix may not be the fastest solution, but it works well on your test data (here named as df)...

library(raster)

r <- raster(nrow=696, ncol=696, crs = NA,
            xmn = 0, xmx = 696, ymn = 0, ymx = 696)
# some indexing corrections
new_xy <- cbind(df[, 2] - 1, 697 - df[, 1])
cells <- cellFromXY(r, new_xy)
r[] <- 0
r[cells] <- unlist(df[, 3])
r <- as.matrix(r)

Then we can check with str(r) it is a 696x696 numeric, and max(r) is a value of 3, as expected. Also, r[2, 3] = 1

J. Win.
  • 6,662
  • 7
  • 34
  • 52
  • I thought that was your desired behavior: "If a V1 and V2 combination does not exist in the table, the value should be 0." – J. Win. May 24 '18 at 10:54
  • `plot(crop(r, extent(c(0, 100, 0, 100))))` zooms in and shows the values are in there... I'm working on the indexing something is strange. – J. Win. May 24 '18 at 11:11
  • library(Matrix); sparseMatrix(df$V1, df$V2, x = df$N) ... might suit your purposes better – J. Win. May 24 '18 at 11:29
0

To duplicate the picture you have added to the original question, I'd do something like this:

# convert your contingency table to the appropriate matrix
M <- sparseMatrix(df$V1, df$V2, x = df$N, dims = c(696, 696))
M <- as.matrix(M)
rownames(M) <- 1:696
colnames(M) <- 1:696

There are many formatting options for displaying the matrix to image, but to start, try:

View(M)

enter image description here

J. Win.
  • 6,662
  • 7
  • 34
  • 52