3

I am doing a classification problem in biometrics. I am comparing with the euclidean distance each probe in the testing set with the gallery.

Everytime I run the code I get different results. If I remove the scaler I get always the same results.

Why does the scaler produce different values? (the difference is slightly, sometimes it recognizes 10 more probes, sometimes 10 less). Thanks to all who answer.

scaler = StandardScaler()
training_walks_matrix = load('training_imputeZero.npy')
training_scaled = scaler.fit_transform(training_walks_matrix)
testing_walks_matrix = load('testing_imputeZero.npy')
testing_scaled = scaler.transform(testing_walks_matrix)
pca = PCA(n_components=50).fit(training_scaled)
training_walks_matrix = pca.transform(training_scaled)
testing_walks_matrix = pca.transform(testing_scaled)
petezurich
  • 9,280
  • 9
  • 43
  • 57

1 Answers1

1

The only thing that I can suspect is that probably the arpack or randomized solvers are used behind the scene in your case since this is defined automatically. In that case, you need to fix the random seed in order to reproduce the results.

Try to fix the random seed by passing a value in the input argument random_state of the PCA instance.

myseed = 0

scaler = StandardScaler()
training_walks_matrix = load('training_imputeZero.npy')
training_scaled = scaler.fit_transform(training_walks_matrix)
testing_walks_matrix = load('testing_imputeZero.npy')
testing_scaled = scaler.transform(testing_walks_matrix)

#here
pca = PCA(n_components=50, random_state=myseed).fit(training_scaled)

training_walks_matrix = pca.transform(training_scaled)
testing_walks_matrix = pca.transform(testing_scaled)
seralouk
  • 30,938
  • 9
  • 118
  • 133
  • Thank you. Now all the tests I do, they have the same results. I see on the doc the attribute random_state but I don't understand what is it. Can you explain it easly? –  Apr 15 '21 at 20:42
  • 1
    internally the code uses random initializations. If you do not set the random seed, then each time you run it, a different seed will be used and you will get different results. To get the same results, you need to set the random seed. – seralouk Apr 15 '21 at 20:55