0

I am trying to get from this page: https://bioportal.bioontology.org/ontologies/MEDDRA?p=classes&conceptid=10040786 the medDRA codes (which are codes for adverse events) list hidden here:

enter image description here

It is in this element:

enter image description here

When I click I get this list:

enter image description here

Which I could scrape with rvest, to get the medDRA codes encapsulated in the links: enter image description here

The problem is how to automatically display the list.

When looking at the XHR, I get this request, which open the list:

https://bioportal.bioontology.org/ajax_concepts/MEDDRA/?conceptid=http%3A%2F%2Fpurl.bioontology.org%2Fontology%2FMEDDRA%2F10040786&callback=children&_=1667913401293

But I do not understand the rationale for the last number, so I do not manage to automate the request. Is there another way? How could I proceed to get this data?

zephryl
  • 14,633
  • 3
  • 11
  • 30
denis
  • 5,580
  • 1
  • 13
  • 40
  • 1
    Can you access the list elements using xpath without expanding the list? If not, you may want to try RSelenium to expand the list. – zephryl Nov 08 '22 at 14:05
  • 3
    Alternatively, have you looked at [their API](http://data.bioontology.org/documentation)? – zephryl Nov 08 '22 at 14:09
  • @zephryl No I cannot, and I would like to avoid RSelenium if possible – denis Nov 08 '22 at 15:31
  • @zephryl no good point, I will have a look. But still interested in a response – denis Nov 08 '22 at 15:32
  • 1
    The last bit is just an [unix](https://www.unixtimestamp.com/) timestamp. It helps, amongst other things, to potentially avoid being served cached results. It can be excluded or recreated (in you plan to make large numbers of requests within a relatively short time frame) – QHarr Nov 08 '22 at 22:34
  • 1
    So, make an http request to the ajax endpoint you have identified, either removing or generating the unix timestamp and adding to the end. Parse response with rvest and extract the elements of interest. – QHarr Nov 08 '22 at 22:42

1 Answers1

1

I have been able to extract the numbers with the following code :

library(RSelenium)

shell('docker run -d -p 4446:4444 selenium/standalone-firefox')
remDr <- remoteDriver(remoteServerAddr = "localhost", port = 4446L, browserName = "firefox")
remDr$open()
remDr$navigate("https://bioportal.bioontology.org/ontologies/MEDDRA?p=classes&conceptid=10040786")
remDr$screenshot(TRUE)

Sys.sleep(3)
web_Obj_Plus_Sign <- remDr$findElement('xpath', '/html/body/div[1]/div[2]/div[5]/div/div/div[2]/div/div[2]/div[2]/div[1]/div[2]/div/ul/li/ul/li[2]/img')
web_Obj_Plus_Sign$clickElement()

list_Url <- list()

for(i in 1 : 100)
{
  print(i)
  xpath <- paste0('/html/body/div[1]/div[2]/div[5]/div/div/div[2]/div/div[2]/div[2]/div[1]/div[2]/div/ul/li/ul/li[2]/ul/li[', i * 2, ']/a')
  web_Obj_Link <- tryCatch(remDr$findElement("xpath", xpath), error = function(e) NA)
  
  if(is.na(web_Obj_Link))
  {
    break
    
  }else
  {
    list_Url[[i]] <- web_Obj_Link$getElementAttribute("href")[[1]] 
  }
}

MEDDRA_Number <- unlist(lapply(X = list_Url, FUN = function(x) tail(strsplit(x, "%")[[1]], 1)))
MEDDRA_Number

1] "2F10000318" "2F10000513" "2F10075963" "2F10059136" "2F10049044" "2F10005192" "2F10051548" "2F10007247"
 [9] "2F10074010" "2F10012470" "2F10065259" "2F10060803" "2F10014141" "2F10014199" "2F10015146" "2F10057211"
[17] "2F10021531" "2F10063866" "2F10071367" "2F10050500" "2F10021784" "2F10065487" "2F10054994" "2F10073621"
[25] "2F10076139" "2F10061303" "2F10061304" "2F10051296" "2F10054019" "2F10087209" "2F10069447" "2F10037578"
[33] "2F10037632" "2F10085875" "2F10037888" "2F10069443" "2F10040855" "2F10040872" "2F10042343" "2F10085173"
[41] "2F10055027" "2F10067653" "2F10066047
Emmanuel Hamel
  • 1,769
  • 7
  • 19