0

I have performed an operation using the mclust package on a nonmissing data frame. The nonmissing data frame was created using the dplyr package by using the select function. As such, row.names appears as a vector in the data frame passed to the mclust function.

I next have extracted some critical values (the case 'classification') from this function as:

class<-functionobject$classification

Thus, the numeric list of classification values is associated with row.names.

When I attempt to append this list of values to a new data frame of the same length (the same cases) without row.names, I lose important ordering, it seems. I know this as when I compare classification groups on other variables in the new data frame, they are not equal to the values obtained in the mclust function using those same variables.

The reason I can not simply append to the nonmissing data frame (with row.names) used in the mclust function is that I require other variables from the data set not used in the function and which needed merged on ID variables as:

NEW_DF=merge(mclust_DF, other_DF, by=c("X1", "X2"))

So I end up with a data frame of the same length but which no longer has row.names on which I want to append the classification values from the mclust function described above. Although no errors are thrown when I use:

FINAL_DF<- cbind.data.frame(NEW_DF, class)

The data are off as I can see inspection of group (class) means on relevant variables do NOT equal those from the mclust function (which they should as it is the same core input data).

I realize I am missing something obvious here, but I have not found an answer despite an exhaustive search of the archives. What is the correct way to go about this rather tedious wrangling?

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
Jhaltiga68
  • 125
  • 9

1 Answers1

0

FWIW: a simple, though perhaps still inefficient solution overall, was to bind the saved classification data from the mclust function to the nonmissing data frame BEFORE merging with additional validation variables as when the merge occurs, the 'row.names' vector induced by dplyr in the select cases function is lost and cases are resorted.

This solution dawned on me as I realized that the mclust function was based on the nonmissing data frame (created using dplyr) and thus resultant data objects followed the case ordering from input data (by row.names)

Jhaltiga68
  • 125
  • 9