Assume a grey-scale 2D image which can mathematically be described as a matrix. Generalizing the concept of a matrix results in theory about tensors (informally you can think of a multidimensional array). I.e. a RGB 2D image is represented as a tensor of size [width, height, 3]. Further a RGB 3D Image is represented as a tensor of size [width, height, depth, 3]. Moreover and like in the case of matrices you can also perform tensor-tensor multiplications.
For instance consider the typical neural network with 2D images as input. Such a network does basically nothing else than matrix-matrix multiplications (despite of the elementwise non-linear operations at nodes). In the same way a neural network operates on tensors by performing tensor-tensor multiplications.
Now back to your question of feature extraction: Indeed the problem of tensors are their high dimensionality. Hence modern research problems regard the efficient decomposition of tensors retaining the initial (most meaningful) information. In order to extract features from tensors a tensor decomposition approach might be a good start in order to reduce the rank of the tensor. A few papers on tensors in machine learning are:
Tensor Decompositions for Learning Latent Variable Models
Supervised Learning With Quantum-Inspired Tensor Networks
Optimal Feature Extraction and Classification of Tensors via Matrix Product State Decomposition
Hope this helps, even though the math behind is not easy.