The VLAD vector (Vector of Locally Aggregated Descriptors) is a descriptor vector which is often used in image retrieval, e.g. to aggregate SIFT descriptors. Use this tag for programming questions related to the VLAD vector.
The VLAD vector was introduced in 2010 (see link below), and is widely used in image retrieval and computer-vision application, to aggregate e.g. sift descriptors. This tag should be used for all questions concerning the implementation and usage of VLAD vectors.
A typical pipeline with the VLAD vector consists of the following steps.
- Extract descriptors from one or multiple images, usually with a SIFT detector.
- Create a codebook of visual words, typically with k-means.
- Compute and accumulate the residuals between the descriptors and the cluster center, for each of the clusters from the codebook.
- Stack the accumulated residuals to form the VLAD vector.
Then, typically the following two steps are performed
- Normalize the VLAD e.g. using the L2 norm.
- Reduce the dimensionality, e.g. using PCA.
Links
Implementation of the VLAD vector in the open-source library vlfeat.
Original Paper: Jégou, H., Douze, M., Schmid, C., & Pérez, P. (2010). Aggregating local descriptors into a compact image representation. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 3304–3311.