What is the most efficient way to read a collection of sdf files? sdf is a chemical table file, containing both 3D information about molecules but also properties of said molecule. All of this information is stored in a multiline (gzipped) ASCII file. What I am struggling with is defining a custom file reader function that is able to interpret the custom subsection of each molecular entry. At this point I'm doubting if this is even the right approach.
<Molecular-ID>
-OEChem-10272110393D
Schrodinger Suite 2021-1.
32 34 0 0 0 0 0 0 0999 V2000
31.1383 33.3647 21.1400 C 0 0 0 0 0 0 0 0 0 0 0 0
30.7977 33.9390 19.9173 C 0 0 0 0 0 0 0 0 0 0 0 0
....
M END
> <ShapeTanimoto>
0.6969
> <ColorTanimoto>
0.7854
> <TanimotoCombo>
1.7854
$$$$