1

I want to download all PDFs from the links in this webpage from my java program - https://www.bseindia.com/corporates/ann.html

I used jSoup to connect to the website and extract all links but it is not showing the PDF file links. When I open the HTML page source it does not show the links to PDF files. I looked at chrome developer tools -> inspect element and it shows the links there.

So how can I access the links of PDF files shown in inspect element using jSoup?

Sprint T
  • 11
  • 1
  • The page uses Angular.js to create it's contents. JSoup on the other hand is only a simple HTML parser and does not run and Javascript. The contents you are seeing through JSoup is basically the HTML with JS disabled. The "inspect" command shows you the generated HTML after all the JS was run, which is therefore different. I don't think JSoup will help you much for this use-case as it does not simulate a full browser but only handles DOM manipulation – Roland Kreuzer Feb 04 '22 at 11:10
  • you could check the URL Angular fetches it's data from for the info you are looking for. https://api.bseindia.com/BseIndiaAPI/api/AnnGetData/w?pageno=1&strCat=-1&strPrevDate=20220204&strScrip=&strSearch=P&strToDate=20220204&strType=C However, since the devs probably have not intended this as a stable API parsing it might break sooner or later – Roland Kreuzer Feb 04 '22 at 11:15
  • This is helpful, thanks. I used httpurlconnection class to fetch the content of the url https://api.bseindia.com/BseIndiaAPI/api/AnnGetData/w?pageno=1&strCat=-1&strPrevDate=20220204&strScrip=&strSearch=P&strToDate=20220204&strType=C but the inputstream is null. Any idea why? – Sprint T Feb 05 '22 at 16:23

0 Answers0