0

I think my question is somewhat similar to this one. cbind is changing the values of the vector I am using (or using references to the values) I am basically getting data from a data frame and then organizing them in columns according to a certain factor (interface type). I think it has something to do with the levels, there, but I am not sure what those even mean right now. Here is what I ma doing and the results I am getting:

#Grouping subjects number of collisions data according to the interface they used
> ui1NumCollisions = dout$numCollisions[ dout$Interface=="0"]
> ui2NumCollisions = dout$numCollisions[ dout$Interface=="1"]
> ui3NumCollisions = dout$numCollisions[ dout$Interface=="2"]
> ui4NumCollisions = dout$numCollisions[ dout$Interface=="3"]
#checking data
> ui1NumCollisions
 [1] 43,  30,  37,  6,   22,  9,   19,  9,   14,  106, 50,  53, 
33 Levels: -1, 10, 106, 11, 12, 13, 14, 15, 16, 17, 18, 19, 2, 21, 22, ... 9,
> ui2NumCollisions
 [1] 17, 16, 23, 12, 15, -1, 11, 26, 19, 32, 36, 13,
33 Levels: -1, 10, 106, 11, 12, 13, 14, 15, 16, 17, 18, 19, 2, 21, 22, ... 9,
> ui3NumCollisions
 [1] 17, 38, 16, 13, 42, 50, 10, 17, 2,  28, 14, 30,
33 Levels: -1, 10, 106, 11, 12, 13, 14, 15, 16, 17, 18, 19, 2, 21, 22, ... 9,
> ui4NumCollisions
 [1] 42, 28, 22, 36, 10, 25, 45, 48, 18, 11, 21, 7, 
33 Levels: -1, 10, 106, 11, 12, 13, 14, 15, 16, 17, 18, 19, 2, 21, 22, ... 9,
#Creates matrix with each column containing collision data for each interface
#(I think)
> uiNumCollisions = cbind( '1' = ui1NumCollisions
+                        , '2' = ui2NumCollisions
+                        , '3' = ui3NumCollisions
+                        , '4' = ui4NumCollisions)
#checking matrix values
> uiNumCollisions
       1  2  3  4
 [1,] 26 10 10 25
 [2,] 20  9 24 19
 [3,] 23 16  9 15
 [4,] 31  5  6 22
 [5,] 15  8 25  2
 [6,] 33  1 29 17
 [7,] 12  4  2 27
 [8,] 33 18 10 28
 [9,]  7 12 13 11
[10,]  3 21 19  4
[11,] 29 22  7 14
[12,] 30  6 20 32
> uiNumCollisionsSummary = summary(uiNumCollisions)
> uiNumCollisionsSummary
       1               2               3              4        
 Min.   : 3.00   Min.   : 1.00   Min.   : 2.0   Min.   : 2.00  
 1st Qu.:14.25   1st Qu.: 5.75   1st Qu.: 8.5   1st Qu.:13.25  
 Median :24.50   Median : 9.50   Median :11.5   Median :18.00  
 Mean   :21.83   Mean   :11.00   Mean   :14.5   Mean   :18.00  
 3rd Qu.:30.25   3rd Qu.:16.50   3rd Qu.:21.0   3rd Qu.:25.50  
 Max.   :33.00   Max.   :22.00   Max.   :29.0   Max.   :32.00 

Notice that 106 is not part of column 1, nor is it the maximum value there, but instead 33. So, why are the values in uiNumCollisions different from the individual columns (ui1NumCollisions, ui2NumCollisions, etc.)? It seems like I am getting the indices of the values from levels table. What I really wanted were the values themselves. This should have a simple answer I assume. I looked at a bunch of problems related to data binding, but could not figure out a solution to this problem using what I have found. What am I missing here?

I thank in advance for the help. Sincerely,

Paulo.

/-------FOLLOW - UP based on reply from DWin-------

Thanks for the reply. The solution of applying the data.frame to uiNumCollisions worked in getting the right data in there. However, when I apply the summary function:

uiNumCollisionsSummary = summary(uiNumCollisions)

I no longer get the statistics I used to (mean, median, etc.). Why is that?

In addition, after that, I want to apply a boxplot to uiNumCollisions and the an anova. For the boxplot, what I use is the following:

par( fig=c(0.0,1.0,0.0,1.0))
temp = boxplot( uiNumCollisions)

The result I get for the boxplot is

"Error in oldClass(stats) <- cl :  adding class "factor" to an invalid object"

For the ANOVA I was using the following code:

temp = c(ui1NumCollisions, ui2NumCollisions, ui3NumCollisions, ui4NumCollisions)
temp.type = rep(c("1", "2", "3", "4"), c(12,12,12,12))
temp.type = factor(temp.type)
options(contrasts = c("contr.helmert", "contr.poly"))
uiNumCollisionsAOV = aov(temp ~ temp.type)
summary(uiNumCollisionsAOV)

However, this obviously will not work unless I convert each column to something else. I tried different fixes, like reapplying factors to each column (e.g.: ui1NumCollisions = factor(ui1NumCollisions)). That fixed the factor levels, but when I went to convert back to numeric values using something like as.numeric(levels(ui1NumCollisions)[ui1NumCollisions]), I only got NAs. Hence,indeed, your solution worked and I really appreciate it, but it does not completely resolve my problem. Is there an easies around? Perhaps to simply import the dout table in a way I can get all the data without the factors that could then resolve all the factor issues I am having?

/-------FOLLOW - UP #2-------

I finally found what the problem was. There were commas between data instead of simply spaces. The file, data.out looked like this:

Subject, uiType, numCollisions, startTimeTraining, startTime, endTime, detlaTraining, deltaTask
0, 0, 43, 0, 510.261, 1743.75, 510.261, 1233.49
1, 1, 17, 0, 1198.65, 2044.62, 1198.65, 845.965
2, 2, 17, 0, 445.788, 1622.83, 445.788, 1177.04
3, 3, 42, 0, 254.793, 1196.93, 254.793, 942.132
4, 1, 16, 0, 1583.5, 2887.39, 1583.5, 1303.9
5, 2, 38, 0, 79.095, 886.533, 79.095, 1287.438
6, 3, 28, 0, 866.75, 1617.48, 866.75, 750.73
7, 1, 23, 0, 565.575, 1361.79, 565.575, 796.216
8, 2, 16, 0, 1211.99, 2538.37, 1211.99, 1326.38
...

And it was supposed to look like this.

Subject uiType numCollisions startTimeTraining startTime endTime detlaTraining deltaTask
0 0 43 0 510.261 1743.75 510.261 1233.49
1 1 17 0 1198.65 2044.62 1198.65 845.965
2 2 17 0 445.788 1622.83 445.788 1177.04
3 3 42 0 254.793 1196.93 254.793 942.132
4 1 16 0 1583.5 2887.39 1583.5 1303.9
5 2 38 0 79.095 886.533 79.095 1287.438
6 3 28 0 866.75 1617.48 866.75 750.73
7 1 23 0 565.575 1361.79 565.575 796.216
8 2 16 0 1211.99 2538.37 1211.99 1326.38
...

When I loaded the data table using these lines:

numSamples = 8#or more
dout = read.table("data.out", header = TRUE)
dout = dout[1:numSamples,]
dout

I would get a weird table filled with integers attached to commas, which messed up my data conversion to numbers and were giving me those factors.

After I fixed that, the original code worked like a charm.

I appreciate the help from DWin and the opportunity to post this issue here, even though it was a rather silly mistake of my part.

Lesson learned: double-check your data after you wake-up instead of before going to bed.

Thanks,

Paulo.

1 Answers1

0

Because you extracted those factor columns as vectors they lost the 'data.frame' class. So it was not so much changing the labels as it was loosing htem entirely. When you used cbind, the result was a matrix. Matrices loose any factor attributes. Factor labels are in the attributes. So the content of the matrix became the factor indices rather than the factor labels. If instead of using cbind you had used the data.frame function your labels would have remained intact. You probably don't want to have your column names be digits, though.

uiNumCollisions = data.frame( one = ui1NumCollisions
                    , two = ui2NumCollisions
                    , three = ui3NumCollisions
                    , four = ui4NumCollisions)

It might help if you looked at :

str(ui1NumCollisions)
attributes(ui1NumCollisions)

Strategy 2: You could have kept the NumCollisions extracts as data.frames with:

 ui1NumCollisions = dout[ dout$Interface=="0", "numCollisions", 
                                              drop=FALSE]

Then you would be using cbind.data.frame (behind the scenes) when you called cbind

IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • Thanks for the reply, please see updates above. By the way, the last option ( ui1NumCollisions = dout$numCollisions[ dout$Interface=="0", "numCollisions", drop=FALSE]) gave me a too many dimensions error, I guess because of the third parameter in the brackets. – Paulo de Barros Jun 20 '13 at 03:46
  • Right. (No data offered for testing, so you get what you paid for.) It should have been `ui1NumCollisions = dout[ dout$Interface=="0", "numCollisions", drop=FALSE]` – IRTFM Jun 20 '13 at 04:51
  • Hi, DWin, oops, I did not notice this crass mistake. And you are right, I should apologize for not having provided the data file. It turned out the data file was really the problem. I was assuming white spaces were separating my data, when in fact, it was commas. This led commas to be added as part of each datum and hence numerical conversion would never happen correctly. The kind of mistake that happens when you work too man hours on the same thing while being half-asleep. I really appreciate your help DWin. It gave me a lot of insight on how R works. Thanks. – Paulo de Barros Jun 21 '13 at 02:05