0

EDIT: The original dataset can be found here: link

I have a matrix like:

data <- matrix(c("a","1","10",
             "b","1","20",
             "c","1","30",
             "a","2","10",
             "b","2","20",
             "a","3","10",
             "c","3","20"),
             ncol=3, byrow=TRUE)

I would like to reshape as a dataframe coercing the missing values to zero:

data <- matrix(c("a","1","10",
             "b","1","20",
             "c","1","30",
             "a","2","10",
             "b","2","20",
             "c","2","0",
             "a","3","10",
             "b","3","0",
             "c","3","20"),
             ncol=3, byrow=TRUE)

How can I do it with the reshape package? Thaks

chopin_is_the_best
  • 1,951
  • 2
  • 23
  • 39

4 Answers4

1

We can use complete from tidyr, after converting your data a little:

library(tidyr)
data <- as.data.frame(data)
data$V3 <- as.numeric(as.character(data$V3))
complete(data, V1, V2, fill = list(V3 = 0))
jeremycg
  • 24,657
  • 5
  • 63
  • 74
  • on my data it says: Error in left_join_impl(x, y, by$x, by$y) : attempt to set index 0/0 in SET_STRING_ELT. In my true dataset V3 is (int). does it affect it somehow? – chopin_is_the_best Nov 10 '15 at 14:58
  • I fear there's something wrong with the data I shared, but I cant understand what. here's the original data: [link](https://drive.google.com/file/d/0B4FnlzCZUFqWcHVVd3RXQnQwQzQ/view?usp=sharing) – chopin_is_the_best Nov 10 '15 at 15:11
  • from your data, I think you want `complete(data, label, count, fill = list(unique_elements = 0))` ? – jeremycg Nov 10 '15 at 15:20
1

tidyr better but if you want use reshape you can

library(reshape2)

data2=dcast(data = as.data.frame(data),V1~V2)
data3=melt( data2,measure.vars=colnames(data2)[-1])
data3[is.na(data3)]="0"
Batanichek
  • 7,761
  • 31
  • 49
1

Seems to me like you are handling something like a multivariate time series. Therefore I would suggest using a proper time series object.

library(zoo)
res=read.zoo(data.frame(data,stringsAsFactors=FALSE),
         split=1,
         index.column=2,
         FUN=as.numeric)
coredata(res)=as.numeric(coredata(res))
coredata(res)[is.na(res)]=0

This gives

res
#  a  b  c 
#1 10 20 30
#2 10 20 0 
#3 10 0  20
cryo111
  • 4,444
  • 1
  • 15
  • 37
1

I think you are doing it wrong by having a matrix with multiple classes.

First I would convert to a data.frame or to a data.table and then convert all the column to the proper type. Something like

library(data.table) # V 1.9.6+
# Convert to data.table
DT <- as.data.table(data) 

# Convert to correct column types
for(j in names(DT)) set(DT, j = j, value = type.convert(DT[[j]])) 

Then we can expand rows using data.table::CJ and assign zeroes to NA values

## Cross join all column except the third
DT <- DT[do.call(CJ, c(unique = TRUE, DT[, -3, with = FALSE])), on = names(DT)[-3]]

## Or if you want only to operate on these two columns you can alternatively do
# DT <- DT[CJ(V1, V2, unique = TRUE), on = c("V1", "V2")]

## Fill with zeroes
DT[is.na(V3), V3 := 0]
DT
#    V1 V2 V3
# 1:  a  1 10
# 2:  a  2 10
# 3:  a  3 10
# 4:  b  1 20
# 5:  b  2 20
# 6:  b  3  0
# 7:  c  1 30
# 8:  c  2  0
# 9:  c  3 20
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
  • I fear there's something wrong with the data I shared, but I cant understand what. here's the original data: [link](https://drive.google.com/file/d/0B4FnlzCZUFqWcHVVd3RXQnQwQzQ/view?usp=sharing) – chopin_is_the_best Nov 10 '15 at 15:11