Apologies if this is obvious, I've found something for when there's an index or for when columns are missing. But I don't think either will work for this.
Example data:
df.test=data.frame( A=c("n,n,y,n" ,"t", "j,k,k")
,B=c("n,y,y,n" ,"" , "k,k,k")
,C=c("n,y,y,n,n","t", "j,k,j")
,D=c("" ,"" , "k,k,j")
)
df.test=lapply(df.test, function(x) as.character(x))
str(df.test) # looks similar to my data
List of 4
$ A: chr [1:3] "n,n,y,n" "t" "j,k,k"
$ B: chr [1:3] "n,y,y,n" "" "k,k,k"
$ C: chr [1:3] "n,y,y,n,n" "t" "j,k,j"
$ D: chr [1:3] "" "" "k,k,j"
My aim is a dataframe:
A B C D
n n n NA
n y y NA
y y y NA
n n n NA
t NA t NA
j k j k
k k k k
k k j j
I'd like column A to be the reference, however it doesn't have unique values. However, it has the maximum number of values permitted from each list (I hope that makes sense). So, the fifth value in C list 1 should be dropped, ie, n y y n n -> n y y n
.
Also, missing values need to be added, (missing according to column A
).
The extra value in C
is a bug from other software (which I don't have influence over). Other than those extra values they correspond to each other, eg the t's should be on the same row (if present).
The best I've done so far is to make a list of vectors, the lists have different lengths, so I can't put them together and they don't correspond.
df3=lapply(df.test, function(x) unlist(strsplit(x,',')))
str(df3)
List of 4
$ A: chr [1:8] "n" "n" "y" "n" ...
$ B: chr [1:7] "n" "y" "y" "n" ...
$ C: chr [1:9] "n" "y" "y" "n" ...
$ D: chr [1:3] "k" "k" "j"