0

I have about 3500 CAS numbers that I would like to extract the chemical information from pubchem and put into a dataframe. I have no idea on how to format the output so I can put it into a dataframe when I use the code below. The output of each call (please see below) seems to give me the same format. It consists of a list of 9 things of varying size, 2 of which are tibbles of varying size.. Any ideas would be appreciated!! Thank you!

library(dplyr)

library(webchem)

ci_query(query, from = c("rn", "inchikey"), verbose = getOption("verbose"))

y1 <- ci_query('50-00-0', from = 'rn')

which yields:

y1
$`50-00-0`
$`50-00-0`$name
[1] "Formaldehyde [USP]" "Methanal"          

$`50-00-0`$synonyms
[1] "AI3-26806"                          "Aldehyd mravenci"                   "Aldehyd 
mravenci [Czech]"          
[4] "Aldehyde formique"                  "Aldehyde formique [French]"         "Aldehyde 
formique [ISO-French]"    
[7] "Aldeide formica"                    "Aldeide formica [Italian]"          "BFV"                               
[10] "Caswell No. 465"                    "CCRIS 315"                          "Dormol"                            
[13] "EC 200-001-8"                       "EINECS 200-001-8"                   "EPA 
Pesticide Chemical Code 043001"
[16] "Fannoform"                          "Formalaz"                           
"Formaldehyd"                       
[19] "Formaldehyd [Czech, Polish]"        "Formaldehyde"                       
"Formaldehyde solution"             
[22] "Formaldehyde, gas"                  "Formalin"                           "Formalin 
40"                       
[25] "Formalin [JAN]"                     "Formalin-loesungen"                 
"Formalin-loesungen [German]"       
[28] "Formalina"                          "Formalina [Italian]"                
"Formaline"                         
[31] "Formaline [German]"                 "Formalith"                          "Formic 
aldehyde"                   
[34] "Formol"                             "FYDE"                               "HSDB 
164"                          
[37] "Karsan"                             "Lysoform"                           
"Methaldehyde"                      
[40] "Methanal"                           "Methyl aldehyde"                    
"Methylene oxide"                   
[43] "Morbicid"                           "NCI-C02799"                         "NSC 
298885"                        
[46] "Oplossingen"                        "Oplossingen [Dutch]"                
"Oxomethane"                        
[49] "Oxymethylene"                       "Paraform"                           "RCRA 
waste number U122"            
[52] "Superlysoform"                      "UN 1198"                            "UN 2209 
(formalin)"                
[55] "UNII-1HG84L3525"                   

$`50-00-0`$cas
[1] "50-00-0"

$`50-00-0`$inchi
[1] "InChI=1S/CH2O/c1-2/h1H2"

$`50-00-0`$inchikey
[1] "WSFSSNUMVMOOMR-UHFFFAOYSA-N"

$`50-00-0`$smiles
[1] "C=O"

$`50-00-0`$toxicity
# A tibble: 24 x 6
Organism   `Test Type` Route        `Reported Dose (Normalized Dose)` Effect                                                 
Source
<chr>      <chr>       <chr>        <chr>                             <chr>                                                  
<chr> 
1 cat        LCLo        inhalation   400mg/m3/2H (400mg/m3)            ""                                                     
"\"To~
2 cat        LDLo        intravenous  30mg/kg (30mg/kg)                 "BLOOD: OTHER 
CHANGES"                                 "Acta~
3 dog        LDLo        intravenous  70mg/kg (70mg/kg)                 ""                                                     
"Inte~
4 dog        LDLo        subcutaneous 350mg/kg (350mg/kg)               ""                                                     
"Inte~
5 frog       LDLo        parenteral   800ug/kg (0.8mg/kg)               ""                                                     
"Inte~
6 guinea pig LD50        oral         260mg/kg (260mg/kg)               ""                                                     
"Jour~
7 human      TCLo        inhalation   17mg/m3/30M (17mg/m3)             "LUNGS, THORAX, 
OR RESPIRATION: OTHER CHANGESSENSE OR~ "JAMA~
8 man        LDLo        unreported   477mg/kg (477mg/kg)               ""                                                     
"\"Po~
9 man        TCLo        inhalation   300ug/m3 (0.3mg/m3)               "SENSE ORGANS 
AND SPECIAL SENSES: OTHER CHANGES: OLFA~ "Gigi~
10 man        TDLo        oral         643mg/kg (643mg/kg)               
"GASTROINTESTINAL: NAUSEA OR VOMITINGLUNGS, THORAX, O~ "Japa~
# ... with 14 more rows

$`50-00-0`$physprop
# A tibble: 8 x 5
`Physical Property`              Value Units            `Temp (deg C)` Source
<chr>                            <dbl> <chr>                     <int> <chr> 
1 Melting Point                -9.2 e+ 1 deg C                        NA EXP   
2 Boiling Point                -1.91e+ 1 deg C                        NA EXP   
3 pKa Dissociation Constant     1.33e+ 1 (none)                       25 EXP   
4 log P (octanol-water)         3.5 e- 1 (none)                       NA EXP   
5 Water Solubility              4   e+ 5 mg/L                         20 EXP   
6 Vapor Pressure                3.89e+ 3 mm Hg                        25 EXP   
7 Henry's Law Constant          3.37e- 7 atm-m3/mole                  25 EXP   
8 Atmospheric OH Rate Constant  9.37e-12 cm3/molecule-sec             25 EXP   

$`50-00-0`$source_url
[1] "https://chem.nlm.nih.gov/chemidplus/rn/50-00-0"


 attr(,"class")
[1] "ci_query" "list"    
neilfws
  • 32,751
  • 5
  • 50
  • 63
tom
  • 315
  • 1
  • 3
  • 10
  • Do you need all of the returned information? It might be easier to write some functions which extract just the parts you need for further work. – neilfws Jun 24 '22 at 01:42
  • I do need it all. One row for each CAS number.. – tom Jun 24 '22 at 11:26
  • I think you're going to struggle to get a simple, meaningful data frame from a complex object like this. It's often better in these cases to work with the object "as is" and write some functions to extract whatever you need. – neilfws Jun 25 '22 at 02:41

0 Answers0