2

I have a dataframe input, just like below format:

 queryid  wifi rssi
1 0004920b wifi1   10
2 0004920b wifi2   20
3 1114920b wifi3   15
4 11000492 wifi1   -10

And I want to create a sparse Matrix use this input dataframe. for example:

queryid   wifi1  wifi2  wifi3
0004920b   10      20    .
1114920b    .       .    15
11000492   -10      .    .
Martin Schmelzer
  • 23,283
  • 6
  • 73
  • 98
user3151261
  • 1,947
  • 2
  • 13
  • 12

2 Answers2

1

I initially thought this was a duplicate of Create Sparse Matrix from a data frame, but encountered errors relating to the requirement that assignment-indexing of sparseMatrices needs to be numeric and those queryid and wifi columns appear to be factors (or character). I'm going to assume they are factors, but users should check.

library(Matrix)
(M <- with( dat, sparseMatrix(i= as.numeric(queryid), j=as.numeric(wifi),x=rssi)))
#------
3 x 3 sparse Matrix of class "dgCMatrix"

[1,]  10 20  .
[2,] -10  .  .
[3,]   .  . 15
dimnames(M) <- list( levels(dat$queryid), levels(dat$wifi) )
#-------
> M
3 x 3 sparse Matrix of class "dgCMatrix"
         wifi1 wifi2 wifi3
0004920b    10    20     .
11000492   -10     .     .
1114920b     .     .    15

It would actually be more difficult to accomplish if these were character columns. Thinking about it (but not testing), I'd probably use this code after creating factors for the character columns.

IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • Thsnks for your elegent code! Yes, when the type of `queryid` and `wifi` is character, some errors happen – user3151261 Apr 11 '18 at 08:23
  • **This answer is wrong for factor variables**! `as.numeric` returns the underlying integer representation of the factor variables. Theses integers are not ordered by the order they appear in the `data.frame`. `unique` gives the values in the order that they appear in the `data.frame`, however. This is why the rows 1114920b and 11000492 are flipped from what they should be in the example. – T.C. Proctor Sep 21 '18 at 15:45
  • I *think* that replacing `unique` with `levels` will actually work. However, the `factor` and `as.numeric` documentation contains warnings that are awfully scary but too long to put in a comment: [factor](https://www.rdocumentation.org/packages/base/versions/3.5.1/topics/factor) [as.numeric](https://www.rdocumentation.org/packages/base/versions/3.5.1/topics/numeric) – T.C. Proctor Sep 21 '18 at 15:54
  • @T.C Proctor : Fixed. `levels` reorders the factor names the same as does factor, while uniques doesn't reorder the values but rather takes then in whatever order they appear in the sequence. So it's not the factor or as.numeric help pages that should be read for warnings but rather the `?unique` page. – IRTFM Sep 21 '18 at 21:20
  • @42- This warning in as.numeric was what was scaring/confusing me: "If x is a factor, as.numeric will return the underlying numeric (integer) representation, which is often meaningless as it may not correspond to the factor levels." My understanding was that this code assumes that "the underlying numeric representation" is the order that items are given in "the factor levels" - ie, the two correspond. Maybe if you remove a factor from the levels, the integers won't change, so there isn't a correspondence? – T.C. Proctor Sep 26 '18 at 16:17
  • The warning is perfectly valid. I was just pointing out that the "reordering" (from what I expected from my knowledge of factor construction) was being done by `unique`, which wasn't really reordering at all but rather keeping the order in the vector. The warning is most commonly relevant when data input infelicities create what the user expects to be numeric, but which actually get turned into factors. – IRTFM Sep 26 '18 at 20:18
1

Here is a short version:

library(tidyverse)
library(Matrix)
df %>% 
  spread(wifi, rssi, fill = 0) %>%
  column_to_rownames("queryid") %>%
  as.matrix(.) %>%
  Matrix(., sparse =T)

Output:

3 x 3 sparse Matrix of class "dgCMatrix"
         wifi1 wifi2 wifi3
0004920b    10    20     .
11000492   -10     .     .
1114920b     .     .    15
Martin Schmelzer
  • 23,283
  • 6
  • 73
  • 98