I have an array of 50 samples of x,y,z-coordinates. These samples represent a linear movement of an arm. I want to do the following:
- Find the best fitting line in the 3d-coordinate system for 50 samples.
- Based on the new calculated line, check if a new point (sample 51) has a distance to the line below a certain threshold.
So far, I tried to use PCA to calculate the best fitting line, but I have some trouble to understand it. I have to do it by hand. First I generated an array of 50 samples.
import random
import numpy as np
x, y, z = np.mgrid[-30:-10:50j], np.mgrid[-50:-30:50j], np.mgrid[0:-10:50j]
points = np.concatenate((x[:, np.newaxis],
y[:, np.newaxis],
z[:, np.newaxis]),
axis=1)
points += np.random.normal(size=points.shape) * 0.8
print(np.shape(points))
This resulted in the following picture. 3D-Plot of 50 samples
Afterwards I have to calculate the cov-matrix and the eigenvectors/eigenvalues for the PCA.
# Mean of every column.
x, y, z = points[:,0], points[:,1], points[:,2]
mean = np.array((sum(x)/len(x), sum(y)/len(y), sum(z)/len(z)))
# Cov-Matrix
N, M = np.shape(points)
cov = np.zeros((M,M))
for i in range(M):
for j in range(M):
summe = 0
for k in range(N):
summe += (points[k][i] - mean[i]) * (points[k][j] - mean[j]) / (N-1)
cov[i, j] = summe
# eigenvectors and eigenvalues. Safe first principle component to variable "eig"
eigval, eigvec = np.linalg.eig(cov)
eig = eigvec[:,np.argmax(eigval)]
The eigenvector corresponding to the highest eigenvalue is our first component. To compare the results, I used sklearn to test whether my values are right or not.
Compared to PCA my values for the direction of the vector are the same:
eig: [-0.67634469 -0.66399367 0.31885775]
sklearn: [ 0.67634469 0.66399367 -0.31885775]
And now I do not have any clue, how I can use this direction vector to calculate a distance to a new point. Furthermore, if I try to plot it, the points and the cloud are off due to an offset.
# Plot points in 3D
def plot_line(arr, line):
fig = plt.figure()
ax = plt.axes(projection='3d')
x, y, z = arr[:,0], arr[:,1], arr[:,2]
ax.scatter3D(x, y, z)
segments = np.arange(-20, 20)[:, np.newaxis] * line
lineplot = ax.plot(*(segments).T, color="red")
lineplot = ax.plot(*(line).T, 'ro')
3D-Coordinate System with points and off direction vector
How to I get the point that line passes through, when I have the slope? Is there a better/faster alternative to get the line or is PCA the best way to do it?
Thank your for your help!