1

https://www.genecards.org/cgi-bin/carddisp.pl?gene=ZSCAN22

On the above webpage, if I click See all 33, I will see the following GET request is sent in Chrome DevTools.

https://www.genecards.org/gene/api/data/Enhancers?geneSymbol=ZSCAN22

Direct accessing of it is blocked.

I have try to use a puppeteer. I can click "See all 33" with puppeteer, but then I need to parse the resulted HTML file. It would be best to directly get the results from https://www.genecards.org/gene/api/data/Enhancers?geneSymbol=ZSCAN22. I am not sure how to get it after clicking "See all 33" with puppeteer.

I am not sure if apify can help.

Can anybody let me know how to scrape it?

user1424739
  • 11,937
  • 17
  • 63
  • 152
  • In the headers of the request you can see `rvhk: xxxx`. Probably that is a token of some sort generated by a prior request. You might need to send that request, receive the token and update your headers. – niko Oct 20 '19 at 21:56

1 Answers1

0

I used selenium it working fine

from selenium import webdriver
browser = webdriver.Chrome(executable_path="C:/src/webdriver/chromedriver.exe")
genesLocations = 'https://www.genecards.org/cgi-bin/carddisp.pl?gene={}'

Extract Genomic Locations

gene='ZSCAN22'
browser.get(genesLocations.format(gene))
location = browser.find_element_by_xpath('//*[@id="genomic_location"]/div/div[3]/div/div')
print(location.text)
mmblack
  • 31
  • 2
  • 5