I would like to parse all chemical properties of a given compound as given in Pubchem in R, using the JSON (or XML) export facility.
Example: ALPHA-IONONE, pubchem compound ID 5282108
https://pubchem.ncbi.nlm.nih.gov/compound/5282108
library("rjson")
data <- rjson::fromJSON(file="https://pubchem.ncbi.nlm.nih.gov/rest/pug_view/data/compound/5282108/JSON/?response_type=display")
or
library("RJSONIO")
data <- RJSONIO::fromJSON("https://pubchem.ncbi.nlm.nih.gov/rest/pug_view/data/compound/5282108/JSON/?response_type=display")
will get me a tree of nested lists, but how do I go from this rather complicated list of nested lists to a nice dataframe or list of dataframes?
In this case, what I am after is everything under
3.1 Computed Descriptors
3.2 Other Identifiers
3.3 Synonyms
4.1 Computed Properties
in a single row of a dataframe and each element in a separate named column with multiple items per element (e.g. multiple synonyms) pasted together with a "|" as a delimiter. E.g. in this case something like
pubchemid IUPAC_Name InChI InChI_Key Canonical SMILES Isomeric SMILES CAS EC Number Wikipedia MeSH Synonyms Depositor-Supplied Synonyms Molecular_Weight Molecular_Formula XLogP3 Hydrogen_Bond_Donor_Count ...
5282108 (E)-4-(2,6,6-trimethylcyclohex-2-en-1-yl)but-3-en-2-one InChI=1S/C13H20O/c1-10-6-5-9-13(3,4)12(10)8-7-11(2)14/h6-8,12H,5,9H2,1-4H3/b8-7+ ....
Fields with multiple items, such as Depositor-Supplied Synonyms could be pasted together with a "|", e.g. value could be ALPHA-IONONE|Iraldeine|...
Second, I would also like to import section 4.2.2 Kovats Retention Index as a dataframe
pubchemid column_class kovats_ri
5282108 Standard non-polar 1413
5282108 Standard non-polar 1417
...
5282108 Semi-standard non-polar 1427
...
(section 4.3.1 GC-MS would have been nice too, but since it only displays the 3 top peaks this is a little useless right now, so I'll skip that)
Anybody any idea how to achieve this in an elegant way?
PS Note that not all these fields will necessarily exist for any given query.
2D structure and some properties can also be obtained from
and 3D structure from
Data can also be exported as XML, using
https://pubchem.ncbi.nlm.nih.gov/rest/pug_view/data/compound/5282108/XML/?response_type=display
if that would be any easier
Note: also tried with R package rpubchem
, but that one only seems to import a small amount of the available info:
library("rpubchem")
get.cid(5282108)
CID IUPACName CanonicalSmile MolecularFormula MolecularWeight TotalFormalCharge XLogP HydrogenBondDonorCount HydrogenBondAcceptorCount HeavyAtomCount TPSA
2 5282108 (E)-4-(2,6,6-trimethylcyclohex-2-en-1-yl)but-3-en-2-one C13H20O 192.297300 0 3 0 1 14 17 5282108