I am using BeautifulSoup (bs4) to extract data from an SSRN paper URL, here is the URL for reference https://papers.ssrn.com/sol3/papers.cfm?abstract_id=962461. The data I want is on the PlumX metrics widget on the right of the page. If you hover over it and look at 'Citations:95' I would like to extract 95. This is in the HTML as:
`<li class="plx-citation">
<span class="ppp-label">Citation Indexes: </span>
<span class="ppp-count">95</span>
</li>`
I have tried many approaches in Python but none of them seem to work:
1) Extracting the information by class
soup.find("li", {"class": "ppp-count"})
The output is None
2) Extracting the information by xpath by using lxml
instead of Soup:
`tree = html.fromstring(paper_url.content)
r = tree.xpath('//*[@id="maincontent"]/div[2]/div[2]/div/div[2]/div/div[2]/div/div/div/ul/li[1]/ul/li/span[2]')`
The output is []
3) I printed out the whole soup and lxml and the plumX data just disappears (these branches of HTML are not there, in fact citations also doesn't have any HTML there).
It is there in the main page (if you check it out using inspect element in a browser but never there in the code). I even tried to use a different parser like html5lib
but it did not fix my problem. Could someone kindly tell me what to do?