-2

As a newbie in R how to treat correctly a variable having multiple values like that :

x = c("1","1","1/2","2","2/3","1/3")

As you see value 3 only appears in conjonction with others.

To compute x further, the best would be to obtain 3 vectors like :

X[1] = c(1,1,1,NA,NA,1)

because "1" appears in 1st, 2nd, 3rd and 6th places. idem with X[2] and X[3]

All information seems to be preserved doing so : Am I wrong ?

I have already tested strsplit but it is not preserving NA's values that are not already in my vector.

Jilber Urbina
  • 58,147
  • 10
  • 114
  • 138

2 Answers2

2

An alternative is to use cSplit_e from my "splitstackshape" package.

x = c("1","1","1/2","2","2/3","1/3")
library(splitstackshape)
cSplit_e(data.frame(x), "x", "/")
#     x x_1 x_2 x_3
# 1   1   1  NA  NA
# 2   1   1  NA  NA
# 3 1/2   1   1  NA
# 4   2  NA   1  NA
# 5 2/3  NA   1   1
# 6 1/3   1  NA   1

(Note that the results here are transposed in comparison to the results in the accepted answer.)

A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
0

This seems to work:

x = c("1","1","1/2","2","2/3","1/3")

#Split on your character. This may not be inclusive of all characters that 
#need to be split on.
xsplit <- strsplit(x, "\\/")
#Find the unique items
xunique <- unique(unlist(xsplit))

#Iterate over each xsplit for all unique values
out <- sapply(xsplit, function(z)  
  sapply(xunique, function(zz) zz %in% z)
)
#convert FALSE to NA
out[out == FALSE] <- NA

#Results in
> out
  [,1] [,2] [,3] [,4] [,5] [,6]
1 TRUE TRUE TRUE   NA   NA TRUE
2   NA   NA TRUE TRUE TRUE   NA
3   NA   NA   NA   NA TRUE TRUE
Chase
  • 67,710
  • 18
  • 144
  • 161
  • almost ! it seems that I can't access all values from 1st line or column together as I wanted to in my example as out[1] is only giving me 1 term when I expected a vector of the first line – Cyrille GUIBERT Sep 08 '12 at 14:32
  • @CyrilleGUIBERT - `out` is a matrix, so you need to index it with the `matrix[i,j]` notation, where `i` refers to the rows and `j` the columns. So `out[1,]` will give you the first row and all columns. a "blank" index means "all rows or columns" respectively. – Chase Sep 08 '12 at 14:41
  • OK : I validate the answer ; a question more : is that the good way for validating such data and compute them after as any other vector inside my data.frame ? – Cyrille GUIBERT Sep 08 '12 at 14:45
  • @CyrilleGUIBERT - to be honest, not really sure. Your data "looks" like it should be numeric, but you seem to be treating it as character data for some reason. What sort of validation are you trying to do? – Chase Sep 08 '12 at 14:52
  • As said I'm a newbie in R; other statistical softwares are treating such data as multiple valued variables (because they are often produced from checkboxes as multiple choice options) ; the further computation are nothing extraordinary : descriptive stats and manipulation against other variable in crosstables - for example; obviously I transform `TRUE` in 1 getting numeric values (with `out[out==TRUE] <- 1`) – Cyrille GUIBERT Sep 08 '12 at 15:05
  • @CyrilleGUIBERT - I still think I'm confused, but as long as you know what you're doing, then I guess that's what matters :) I'm not sure how a survey response to a checkbox question would return something like "1/2" or "2/3". Every software I've worked with generally coded checkboxes as binary, 1 == checked, 0 == not checked. Also, FYI = `TRUE` is equivalent to `1`...you can see this with `TRUE == 1` or `as.numeric(TRUE)` – Chase Sep 08 '12 at 22:45
  • Now I realize that the matrix can't be inserted easily in the original data.frame so that without thinking inserting as many times vectors of lines the matrix has rows in my d.f I can't reasonably analyze the data in the matrix against another field of the d.f : the example would be `sex=c("15","25","31","30","75","55")` and x, both "fields" from a global data.frame ; it would be a tridimensional data.frame; I could compare one by one but the goal is to compare globally - perhaps is there even no need to split ? I am only searching good/best solutions... – Cyrille GUIBERT Sep 12 '12 at 16:49