I have about 3500 CAS numbers that I would like to extract the chemical information from pubchem and put into a dataframe. I have no idea on how to format the output so I can put it into a dataframe when I use the code below. The output of each call (please see below) seems to give me the same format. It consists of a list of 9 things of varying size, 2 of which are tibbles of varying size.. Any ideas would be appreciated!! Thank you!
library(dplyr)
library(webchem)
ci_query(query, from = c("rn", "inchikey"), verbose = getOption("verbose"))
y1 <- ci_query('50-00-0', from = 'rn')
which yields:
y1
$`50-00-0`
$`50-00-0`$name
[1] "Formaldehyde [USP]" "Methanal"
$`50-00-0`$synonyms
[1] "AI3-26806" "Aldehyd mravenci" "Aldehyd
mravenci [Czech]"
[4] "Aldehyde formique" "Aldehyde formique [French]" "Aldehyde
formique [ISO-French]"
[7] "Aldeide formica" "Aldeide formica [Italian]" "BFV"
[10] "Caswell No. 465" "CCRIS 315" "Dormol"
[13] "EC 200-001-8" "EINECS 200-001-8" "EPA
Pesticide Chemical Code 043001"
[16] "Fannoform" "Formalaz"
"Formaldehyd"
[19] "Formaldehyd [Czech, Polish]" "Formaldehyde"
"Formaldehyde solution"
[22] "Formaldehyde, gas" "Formalin" "Formalin
40"
[25] "Formalin [JAN]" "Formalin-loesungen"
"Formalin-loesungen [German]"
[28] "Formalina" "Formalina [Italian]"
"Formaline"
[31] "Formaline [German]" "Formalith" "Formic
aldehyde"
[34] "Formol" "FYDE" "HSDB
164"
[37] "Karsan" "Lysoform"
"Methaldehyde"
[40] "Methanal" "Methyl aldehyde"
"Methylene oxide"
[43] "Morbicid" "NCI-C02799" "NSC
298885"
[46] "Oplossingen" "Oplossingen [Dutch]"
"Oxomethane"
[49] "Oxymethylene" "Paraform" "RCRA
waste number U122"
[52] "Superlysoform" "UN 1198" "UN 2209
(formalin)"
[55] "UNII-1HG84L3525"
$`50-00-0`$cas
[1] "50-00-0"
$`50-00-0`$inchi
[1] "InChI=1S/CH2O/c1-2/h1H2"
$`50-00-0`$inchikey
[1] "WSFSSNUMVMOOMR-UHFFFAOYSA-N"
$`50-00-0`$smiles
[1] "C=O"
$`50-00-0`$toxicity
# A tibble: 24 x 6
Organism `Test Type` Route `Reported Dose (Normalized Dose)` Effect
Source
<chr> <chr> <chr> <chr> <chr>
<chr>
1 cat LCLo inhalation 400mg/m3/2H (400mg/m3) ""
"\"To~
2 cat LDLo intravenous 30mg/kg (30mg/kg) "BLOOD: OTHER
CHANGES" "Acta~
3 dog LDLo intravenous 70mg/kg (70mg/kg) ""
"Inte~
4 dog LDLo subcutaneous 350mg/kg (350mg/kg) ""
"Inte~
5 frog LDLo parenteral 800ug/kg (0.8mg/kg) ""
"Inte~
6 guinea pig LD50 oral 260mg/kg (260mg/kg) ""
"Jour~
7 human TCLo inhalation 17mg/m3/30M (17mg/m3) "LUNGS, THORAX,
OR RESPIRATION: OTHER CHANGESSENSE OR~ "JAMA~
8 man LDLo unreported 477mg/kg (477mg/kg) ""
"\"Po~
9 man TCLo inhalation 300ug/m3 (0.3mg/m3) "SENSE ORGANS
AND SPECIAL SENSES: OTHER CHANGES: OLFA~ "Gigi~
10 man TDLo oral 643mg/kg (643mg/kg)
"GASTROINTESTINAL: NAUSEA OR VOMITINGLUNGS, THORAX, O~ "Japa~
# ... with 14 more rows
$`50-00-0`$physprop
# A tibble: 8 x 5
`Physical Property` Value Units `Temp (deg C)` Source
<chr> <dbl> <chr> <int> <chr>
1 Melting Point -9.2 e+ 1 deg C NA EXP
2 Boiling Point -1.91e+ 1 deg C NA EXP
3 pKa Dissociation Constant 1.33e+ 1 (none) 25 EXP
4 log P (octanol-water) 3.5 e- 1 (none) NA EXP
5 Water Solubility 4 e+ 5 mg/L 20 EXP
6 Vapor Pressure 3.89e+ 3 mm Hg 25 EXP
7 Henry's Law Constant 3.37e- 7 atm-m3/mole 25 EXP
8 Atmospheric OH Rate Constant 9.37e-12 cm3/molecule-sec 25 EXP
$`50-00-0`$source_url
[1] "https://chem.nlm.nih.gov/chemidplus/rn/50-00-0"
attr(,"class")
[1] "ci_query" "list"