Scrapy unable to access child div class

Question

I am using Scrapy to scrape href link in the table in this webpage https://researchgrant.gov.sg/eservices/advanced-search/?keyword=&source=sharepoint&type=project&status=open&page=2&_pp_projectstatus=&_pp_hiname=ab&_pp_piname=pua&_pp_source=sharepoint&_pp_details=#project. I am able to access the div MVCGridTableHolder_advancesearchawardedprojectsp_ but couldn't access to its child which are the div class row and div style, my attempt is shown below. Is it because of the partial view?

html code:

<div id="MVCGridContainer_advancesearchawardedprojectsp_" data-key="" class="MVCGridContainer">
<!--Partial View!-->
<div class="row"></div>
<div style="overflow-x:auto;">
<table name="MVCGridTable_advancesearchawardedprojectsp" class="table table-striped table-bordered iris-grid">
<thead></thead>
<tbody>
      <tr>
         <td>
         <a class="grid-link" target="_top" href="https://researchgrant.gov.sg/pages/Awarded-Project-Detail.aspx?AXID=MOH-000080&amp;CompanyCode=moh">INVESTIGATING DIVERSIFIED BIFUNCTIONAL MACROCYCLES BY PHAGE DISPLAY AS A NOVEL TECHNOLOGY PLATFORM</a>
         </td>
</div></div>

Scrapy shell attempt:

In [12]: quote = response.xpath('//div[@id="MVCGridTableHolder_advancesearchawardedprojectsp_"]')

In [13]: quote
Out[13]: [<Selector 
xpath='//div[@id="MVCGridTableHolder_advancesearchawardedprojectsp_"]' data='<div id="MVCGridTableHolder_advancese...'>]

In [14]: quote = response.xpath('//div[@id="MVCGridTableHolder_advancesearchawardedprojectsp_"]/div[@class="row"]')

In [15]: quote
Out[15]: []

Two things: 1. don't post your code as an image and do post it as a text 2. likely, the results are loaded dynamically by the browser with an additional request and rendered by the browser (that "partial view" comment there kind of referring to that) — alecxe, Dec 09 '19 at 03:06
always put code, data and full error message as text in question. — furas, Dec 09 '19 at 03:09
where is your code ? What command do you use to get it? What is URL for this page? Is this page using JavaScript to add element? Scrapy can't run JavaScript. And add it as text. Python can't read code and data from image. — furas, Dec 09 '19 at 03:12
hey guys thanks for the input I have made some updates.. Please do guide me patiently as I am really new to Scrapy. And yes after checking i think it uses Javascript. Is there any recommendation for scraping webpages that uses Javascript? — adrian, Dec 09 '19 at 03:31
@alecxe then is it possible to scrape the result in this case? — adrian, Dec 09 '19 at 03:37

score 0 · Accepted Answer · answered Dec 09 '19 at 03:45

If you open browser developer tools in your browser when loading this page, you would see that there is a separate XHR request sent to load that partial view content. You could simulate that request in your code.

Example using requests:

import requests


with requests.Session() as session:
    session.verify = False

    session.headers = {
        'X-Requested-With': 'XMLHttpRequest'
    }
    response = session.post("https://researchgrant.gov.sg/eservices/mvcgrid", params={
        'keyword': '',
        'source': 'sharepoint',
        'type': 'project',
        'status': 'open',
        'page': '2',
        '_pp_projectstatus': '',
        '_pp_hiname': 'ab',
        '_pp_piname': 'pua',
        '_pp_source': 'sharepoint',
        '_pp_details': ''},
        data={
            'name': 'advancesearchawardedprojectsp'
        })

    print(response.text)

In Scrapy, you could do it with a FormRequest:

Send Post Request in Scrapy

alrite thank you so much! will check it out and mark it as answer soon! appreciate! — adrian, Dec 09 '19 at 04:57
I am able to scrape the data already thanks to your help, but it seems like the parameter is not working in the FormRequest. do you have any idea why? — adrian, Dec 10 '19 at 02:38
the 'hiname' with 'ab' and 'piname' with 'pua' should only return 1 result but instead it returns all the result — adrian, Dec 10 '19 at 02:46

Scrapy unable to access child div class

1 Answers1