I'm tying to build ML model for chemistry. The amount of input data is pretty large (~1M molecules), and I can't just make full list of available descriptors for each molecule. So I use a sample and run my model on it to get a list of most important descriptors. How can I make descriptors of molecules using a list of the descriptors in modred. Also I will be glad to know another way to generate molecular descriptors.
Here is the code
res['mols'] = res['smiles'].swifter.apply(lambda x: Chem.MolFromSmiles(x))
from mordred import Calculator, descriptors
calc = Calculator(descriptors, ignore_3D=True)
desc = calc.pandas(res['mols'])
#The model implementation is ommitted
most_important = pd.DataFrame((desc.columns, model.feature_importances_)).T.sort_values(by = 1, ascending = False).head(100)[0].values #Here I have the list of most important descriptors