I have a set of multivariate instances and I need to extract a representative set from these instances; for instance if I have 100,000 multivariate instances, I want to extract 1000 instances that would be representative of the original distribution. I used Latin Hypercube Sampling and Random Sampling to extract two representative sets and now I want to check how much of a correlation these two representative sets have with the original set.
If I further elaborate;
I have 100,000 multivariate instances (let's call it A)
I derive two representative samples from 'A' (each set will have 1000 instances; let's call these two sets B and C)
I want to check whether 'B' and 'C' preserves the distribution of the original 'A'.
Thanks a lot in advance!