please help me to understand this idea from a paper which titled is "Scene Summarization for Online Image Collections" by Ian Simon Noah Snavely Steven M. Seitz, University of Washington.
Computing the Feature-Image Matrix :
We first transform the set of views into a feature-image
incidence matrix. To do so, we use the SIFT keypoint detector
to find feature points in all of the images in V. The
feature points are represented using the SIFT descriptor.
Then, for each pair of images, we perform feature matching
on the descriptors to extract a set of candidate matches.
We further prune the set of candidates by estimating a fundamental
matrix using RANSAC and removing all inconsistent
matches After the previous step is complete
for all images,
we organize the matches into tracks,
where a track is a connected component of features. We remove
tracks containing fewer than two features total, or at
least two features in the same image. At this point, we consider
each track as corresponding to a single 3D point in S.
From the set of tracks, it is easy to construct the |S|-by-|V|
feature-image incidence matrix.
the part which i confused about is the italic one.
how we organize matches into tracks ?
and how to construct feature-image incidence matrix ?
pls help me. . .