Random forest consists of a number of decision trees. Every node in the decision trees is a condition on a single feature, designed to split the dataset into two so that similar response values end up in the same set. The measure based on which the (locally) optimal condition is chosen is called impurity. For classification, it is typically either Gini impurity or information gain/entropy and for regression trees it is variance. Thus when training a tree, it can be computed how much each feature decreases the weighted impurity in a tree. For a forest, the impurity decrease from each feature can be averaged and the features are ranked according to this measure.
It is however important to note that feature_importances_ in Random Forests don't necessarily predict the correct rank of each feature. Two highly correlated features may be on opposite sides of rank table. This won't affect performance of the model if you drop the mistakenly ranked feature though.However it isn't a reliable method to know the importance of each feature. To get around this limitation, I use Sequential Backward Selection.