I am trying to create a 9 x 9 probability matrix from a contingency / frequency table.
It contains the frequencies for a pair of values (x1,x2)
transitioning to a pair of values (y1,y2)
. x1
and y1
have values of A
, B
, or C
, and x2
and y2
have value of D
, E
, OR F
.
Transitions between all xy
pairs do not exist. However, I would like to have these 'missing' transitions to be present as zeros with the table / matrix to make it square (9x9) for use in other analyses.
df <- structure(list(x1 = structure(c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L,
3L, 1L, 2L, 3L), .Label = c("A", "B", "C"), class = "factor"),
y1 = structure(c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
2L, 3L), .Label = c("A", "B", "C"), class = "factor"),
x2 = structure(c(1L,2L, 3L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 1L),
.Label = c("D", "E", "F"), class = "factor"),
y2 = structure(c(1L, 2L, 3L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 3L),
.Label = c("D", "E", "F"), class = "factor"),
x = c("AD", "BE", "CF", "AD", "BD", "CD", "AE", "BE", "CE", "AE", "BF", "CD"),
y = c("AD", "BE", "CF", "AE", "BD", "CD", "AD", "BD", "CD", "AE", "BE", "CF")),
.Names = c("x1", "y1", "x2", "y2", "x", "y"), row.names = c(NA, -12L), class = "data.frame")
# df$x <- paste0(df$x1, df$x2) # included in the dput
# df$y <- paste0(df$y1,df$y2)
# convert to factor to include all transitions http://stackoverflow.com/a/13705236/1670053
df$x <- factor(df$x, levels = c("AD", "AE", "AF", "BD", "BE", "BF", "CD", "CE", "CF"))
df$y <- factor(df$y,levels = c("AD", "AE", "AF", "BD", "BE", "BF", "CD", "CE", "CF") )
t1 <- with(df,(table(x,y)))
# t1m <- as.data.frame.matrix(t1)
t2 <- t1/(colSums(t1))
dfm <- as.data.frame.matrix(t2)
#dm <- as.matrix(dfm)
The result DFM
, above, without using factor
on x
and y
has the correct values, but of course does include the full set of 9x9 transitions. The desired results DFMd
is below.
However, when I include the factor
ed x
and y
, the result that is produced is not desired, values of NA
and Inf
are introduced.
Is there a way when using 'missing' factors to evaluate table/colSums(table)
and get the desired result?
DFMd <- structure(list(AD = c(0.5, 0.5, 0, 0, 0, 0, 0, 0, 0), AE = c(0.5,
0.5, 0, 0, 0, 0, 0, 0, 0), AF = c(0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L), BD = c(0, 0, 0, 0.5, 0.5, 0, 0, 0, 0), BE = c(0, 0,
0, 0, 0.5, 0.5, 0, 0, 0), BF = c(0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L), CD = c(0, 0, 0, 0, 0, 0, 0.5, 0.5, 0), CE = c(0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L), CF = c(0, 0, 0, 0, 0, 0, 0.5, 0,
0.5)), .Names = c("AD", "AE", "AF", "BD", "BE", "BF", "CD", "CE",
"CF"), class = "data.frame", row.names = c("AD", "AE", "AF",
"BD", "BE", "BF", "CD", "CE", "CF"))