2

I'm using scrapy + splash plugin, I have a button which triggers a download event via ajax, I need to get the downloaded file, but don't know how.

My lua script is something like this

    function main(splash)
        splash:init_cookies(splash.args.cookies)
        assert(splash:go{
            splash.args.url,
            headers=splash.args.headers,
            http_method=splash.args.http_method,
            body=splash.args.body,
        })
        assert(splash:wait(0.5))
                    local get_dimensions = splash:jsfunc([[
            function () {
                var rect = document.querySelector('a[aria-label="Download XML"]').getClientRects()[0];
                return {"x": rect.left, "y": rect.top}
            }
        ]])
        splash:set_viewport_full()
        splash:wait(0.1)
        local dimensions = get_dimensions()
        -- FIXME: button must be inside a viewport
        splash:mouse_click(dimensions.x, dimensions.y)
        splash:wait(0.1)
        return splash:html()
    end

My request object from my spider:

    yield SplashFormRequest(self.urls['url'],
                            formdata=FormBuilder.build_form(response, some_object[0]),
                            callback=self.parse_cuenta,
                            cache_args=['lua_source'],
                            endpoint='execute',
                            args={'lua_source': self.script_click_xml})

Thanks in advance

delpo
  • 210
  • 2
  • 18

1 Answers1

1

I just tried this with SplashFormRequest and it looks like splash won't work for you. Instead you can send the same Ajax request using python Requests.

here is an example

data = {'__EVENTTARGET': 'main_0$body_0$lnkDownloadBio',
        '__EVENTARGUMENT': '',
        '__VIEWSTATE': viewstate,
        '__VIEWSTATEGENERATOR': viewstategen,
        '__EVENTVALIDATION': eventvalid,
        'search': '',
        'filters': '',
        'score': ''}

HEADERS = {
        'Content-Type':'application/x-www-form-urlencoded',
        'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.101 Safari/537.36',
        'Accept': 'text / html, application / xhtml + xml, application / xml;q = 0.9, image / webp, image / apng, * / *;q = 0.8'
    }

data = urllib.urlencode(data)
r = requests.post(submit_url, data=data, allow_redirects=False, headers=HEADERS)
filename = 'name-%s.pdf' % item['first_name']
with open(filename, 'wb') as f:
    f.write(r.content)

Please make sure the data and headers you sending are correct.

Sanoop PK
  • 100
  • 2
  • 10