I have a data frame as follows
Identifier V1 Location V2
1 12 A 21
1 12 B 24
2 20 B 15
2 20 C 18
2 20 B 23
3 43 A 10
3 43 B 17
3 43 A 18
3 43 B 20
3 43 C 25
3 43 A 30
I’d like to re-cast it with a single row for each Identifier and one column for each value in the current location column. I don’t care about the data in V1 but I need the data in V2 and these will become the values in the new columns.
Note that for the Location column there are repeated values for Identifiers 2 and 3.
I ASSUME that the first task is to make the values in the Location column unique.
I used the following (the data frame is called “Test”)
L<-length(Test$Identifier)
for (i in 1:L)
{
temp<-Test$Location[Test$Identifier==i]
temp1<-make.unique(as.character(temp), sep="-")
levels(Test$Location)=c(levels(Test$Location),temp1)
Test$Location[Test$Identifier==i]=temp1
}
This produces
Identifier V1 Location V2
1 12 A 21
1 12 B 24
2 20 B 15
2 20 C 18
2 20 B-1 23
3 43 A 10
3 43 B 17
3 43 A-1 18
3 43 B-1 20
3 43 C 25
3 50 A-2 30
Then using
cast(Test, Identifier ~ Location)
gives
Identifier A B C B-1 A-1 A-2
1 21 24 NA NA NA NA
2 NA 15 18 23 NA NA
3 10 17 25 20 18 30
And this is more or less what I want.
My questions are
Is this the right way to handle the problem?
I know R-people don’t use the “for” construction so is there a more R-elegant (relegant?) way to do this? I should mention that the real data set has over 160,000 rows and starts with over 50 unique values in the Location vector and the function takes just over an hour to run. Anything quicker would be good. I should also mention that the cast function had to be run on 20-30k rows of the output at a time despite increasing the memory limit. All the cast outputs were then merged
Is there a way to sort the columns in the output so that (here) they are A, A-1, A-2, B, B-1, C
Please be gentle with your replies!