1

I have measurements of 5 devices at two different points of time. A measurement basically consists of an array of ones and zeros corresponding to a bit value at the corresponding location:

whos measurement1_dev1_time1

Name                         Size               Bytes  Class      Attributes

measurement1_dev1_time1      4096x8             32768  logical

I assume that for a specific device the changes between time 1 and 2 of the measurements are unique. However, since I am dealing with 32768 bits at different locations, it is quite hard to visualize if there is some kind of dependency.

As every bit at location xcan be regarded as one dimension of an observation I thought to use PCA to reduce the number of dimensions.

Thus, for every of the 5 devices:

  1. I randomly sample n measurements at point t1and t2 seperatly
  2. I prepare an array as input for pca() with m*n columns (m< 32768; its a subset of all the observed bits, as the original data might be too big for pca) and 4 rows (one row for each device).
  3. On this array A I calculate the pca: ``[coeff score latent] = pca(zscore(A))```
  4. Then I try to visualize it using biplot: biplot(coeff(:,1:2), 'score', score(:,1:2))

However, this gives me really strange results. Maybe PCA is not the right approach for this problem? I also modified the input data to do the PCA not on the logical bit array itself. Instead, I created a vector, which holds the indices where there is a '1' in the original measurement array. Also this produces strange results.

As I am completely new to PCA I want to ask you if you either see a flaw in the process or if PCA is just not the right approach for my goal and I better look for other dimension reduction approaches or clustering algorithms.

Richard Laurant
  • 647
  • 1
  • 8
  • 21

1 Answers1

1

Can this 'some kind of dependency' be just pairwise correlation of your data points? Or what do you want to find out?

Do you get 'expected results' if you do:

meas_norm = 2*measurement1_dev1_time1 - 1;

CovarianceMatrix = meas_norm' * meas_norm;

figure
pcolor(CovarianceMatrix )

Can there be a problem of data type? Try feeding double(data). (Please add proper code into your example)

if you look for dimension reduction, you can also think about ICA.


UPD: can you probe it with xor? As you cannot do xor on rows or columns, you can trick all(x, dimension)

example = imread('cameraman.tif')>128;

meas_points = numel(example);
num_sensors = 4;

%// simulate data for t1
meas_before = repmat(example(:), 1, num_sensors);
flickering_before = (rand(meas_points, num_sensors)<0.001);
meas_before(flickering_before) = ~meas_before(flickering_before);

%// simulate position of changing pixels, let's say 8%
true_change = (rand(num_sensors,1)<0.08);

%// simulate data for t2    
meas_after = repmat(example(:), 1, num_sensors);
meas_after(true_change) = ~meas_after(true_chage);
flickering_after = (rand(meas_points, num_sensors)<0.001);
meas_after(flickering_after) = ~meas_after(flickering_after);

stable_points_after = all(meas_after, 2) | all(~meas_after, 2);
stable_point_fraction = sum(stable_points_after)./ meas_points;

%// similarly for the states before (i.e. t1)
stable_points_before = all(meas_before, 2) | all(~meas_before, 2);   

%// now see which change coherently
stable_chage = meas_after(stable_points_after, 1) & meas_before(stable_points_before, 1)
Dima Lituiev
  • 12,544
  • 10
  • 41
  • 58
  • The correlation I expect is the following: given a particular device and n measurements at time t more then 90% of the bits at their individual position will be the same for the n measurements (thus, there will be 10% noise, which are bit flips). However, the position of the stable bits as well as their preferred value (0 or 1) will change from t1 to t2 for the devices individually. I want to visualize this uniqueness of the changes. Thus, you code using the CovarianceMatrix does not express what I want to express. I will look into ICA and add some code. Thanks a lot. – Richard Laurant Nov 17 '14 at 16:41
  • It depends on what you are referring to as pairwise correlation between my data points: indeed the expected correlation is always between the i-th data point of a measurement at point t1 and the i-th data point of a measurement at point t2. However, only looking at all the data points reveals the characteristics between t1 and t2, which are unique for a single device. – Richard Laurant Nov 19 '14 at 13:56
  • There are is a problem with the updated code: stable_points_before is undefined. Should it be: stable_points_before = all(meas_before, 2) | all(~meas_before, 2); – Richard Laurant Nov 24 '14 at 09:08
  • exactly, like for '*after' – Dima Lituiev Nov 24 '14 at 11:50