4

I'm reading through instructions of Matrix package in R. But I couldn't understand the p argument in function:

sparseMatrix(i = ep, j = ep, p, x, dims, dimnames,
         symmetric = FALSE, index1 = TRUE,
         giveCsparse = TRUE, check = TRUE)

According to http://stat.ethz.ch/R-manual/R-devel/library/Matrix/html/sparseMatrix.html

p:
numeric (integer valued) vector of pointers, one for each column (or row), to the initial (zero-based) index of elements in the column (or row). Exactly one of i, j or p must be missing.

I figured p is for compressed representation of either the row or column indices because it's wasteful to have multiple elements in either i or j to have the same value to represent a single row/column. But when I tried the example provided, I still couldn't figure out how p is controlling which element of x goes to which row/column

dn <- list(LETTERS[1:3], letters[1:5])
## pointer vectors can be used, and the (i,x) slots are sorted if necessary:
m <- sparseMatrix(i = c(3,1, 3:2, 2:1), p= c(0:2, 4,4,6), x = 1:6, dimnames = dn)
Josh O'Brien
  • 159,210
  • 26
  • 366
  • 455
GorillaInR
  • 675
  • 2
  • 7
  • 19

1 Answers1

3

Just read a bit farther down in ?SparseMatrix to learn how p is interpreted. (In particular, note the bit about the "expanded form" of p.)

If ‘i’ or ‘j’ is missing then ‘p’ must be a non-decreasing integer vector whose first element is zero. It provides the compressed, or “pointer” representation of the row or column indices, whichever is missing. The expanded form of ‘p’, ‘rep(seq_along(dp),dp)’ where ‘dp <- diff(p)’, is used as the (1-based) row or column indices.

Here is a little function that will help you see what that means in practice:

pex <- function(p) {
    dp <- diff(p)
    rep(seq_along(dp), dp)
}

## Play around with the function to discover the indices encoded by p.
pex(p = c(0,1,2,3))
# [1] 1 2 3

pex(p = c(0,0,1,2,3))
# [1] 2 3 4

pex(p = c(10,11,12,13))
# [1] 1 2 3

pex(p = c(0,0,2,5))
# [1] 2 2 3 3 3

pex(p = c(0,1,3,3,3,3,8))
# [1] 1 2 2 6 6 6 6 6
Josh O'Brien
  • 159,210
  • 26
  • 366
  • 455
  • Thanks, got it now. So the value of the ith element in p represents the number of x elements already included in the first (i-1)th columns.In other words, the difference between the ith and the (i-1)th element in p is the number of x elements in (i-1) column – GorillaInR Nov 15 '13 at 19:52
  • @GorillaInR -- Yep. Strictly speaking, the penultimate sentence in your comment isn't accurate, but the final sentence is. So yeah, looks like you've got it. – Josh O'Brien Nov 15 '13 at 20:01
  • --If the final statement is true, then the penultimate statement would be true too: each difference is the number of x elements in corresponding column, then the sum of the first (i-1) differences = the total number of x elements already included in the first (i-1) columns = the value of ith element of p – GorillaInR Nov 15 '13 at 22:06
  • @GorillaInR -- But see the results of `pex(p = c(10,11,12,13))`. – Josh O'Brien Nov 15 '13 at 22:25
  • But p has to start with 0: "If i or j is missing then p must be a non-decreasing integer vector whose first element is zero." Anyway thank you for pointing out the vital part of p's explanation, otherwise I would never have figured out – GorillaInR Nov 16 '13 at 00:56
  • @GorillaInR -- Right you are. Thanks for pointing *that* bit out. – Josh O'Brien Nov 16 '13 at 02:05