I am trying to make a function get_CIDname()
Each chemical compound has a designated CID, Compound ID, from PubChem's chemical database.
For example, Acetic Acid is 176, and water is 962
I have a dataframe with a column of these CIDs, and some other character value columns. I would like to mutate a new column that names each CID as the column's title name from the site.
Example:
i.e. all instances of 962 in this identifier column is replaced with 'Water', and all instances of 176 is replaced with 'Acetic Acid', the main name on the website https://pubchem.ncbi.nlm.nih.gov/compound/CID
example dataset:
df <- data.frame("Compound" = c(176,29096,6341,8914,5366204,98464,11572,9231,535144,15669393,1738127,1738124), "Value" = rnorm(12, mean = 500000, sd = 600000))
desired output:
df <- data.frame("Compound" = c(176,29096,6341,8914,5366204,98464,11572,9231,535144,15669393,1738127,1738124), "Value" = rnorm(12, mean = 500000, sd = 600000),
Match = c("Acetic Acid", "Dihydromyrcenol", etc....))
Currently, I have:
get_CIDname <- function(CID){
read_html(paste0("https://pubchem.ncbi.nlm.nih.gov/compound/",
CID))
}
but do not know how to decipher the HTML of the PubChem's website. What comes next? What is this type of syntax/programming called?