How to download file from a page using python

Question

I am having troubles downloading txt file from this page: https://www.ceps.cz/en/all-data#RegulationEnergy (when you scroll down and see Download: txt, xls and xml).

My goal is to create scraper that will go to the linked page, clicks on the txt link for example and saves a downloaded file.

Main problems that I am not sure how to solve:

The file doesn't have a real link that I can call and download it, but the link is created with JS based on filters and file type.
When I use requests library for python and call the link with all headers it just redirects me to https://www.ceps.cz/en/all-data .

Approaches tried:

Using scraper such as ParseHub to download link didn't work as intended. But this scraper was the closest to what I've wanted to get.
Used requests library to connect to the link using headers that HXR request uses for downloading the file but it just redirects me to https://www.ceps.cz/en/all-data .

If you could propose some solution for this task, thank you in advance. :-)

score 2 · Answer 1 · answered Sep 04 '18 at 18:12

You can download this data to a directory of your choice with Selenium; you just need to specify the directory to which the data will be saved. In what follows below, I'll save the txt data to my desktop:

from selenium import webdriver

download_dir = '/Users/doug/Desktop/'

chrome_options = webdriver.ChromeOptions()
prefs = {'download.default_directory' : download_dir}
chrome_options.add_experimental_option('prefs', prefs)
driver = webdriver.Chrome(chrome_options=chrome_options)
driver.get('https://www.ceps.cz/en/all-data')

container = driver.find_element_by_class_name('download-graph-data')
button = container.find_element_by_tag_name('li')
button.click()

Hi @duhaime good solution, can you tell me way to read html content through selenium ? — Naga kiran, Oct 01 '18 at 17:26
@NagaKiran Sure thing, using the code above, we'd call `driver.page_source` - that will return the HTML for the current page. I hope that helps! — duhaime, Oct 01 '18 at 17:37

Federico Rubbi · Answer 2 · 2018-09-05T08:10:48.463

0

You should do like so:

import requests

txt_format = 'txt'
xls_format = 'xls' # open in binary mode
xml_format = 'xlm' # open in binary mode

def download(file_type):
    url = f'https://www.ceps.cz/download-data/?format={txt_format}'

    response = requests.get(url)

    if file_type is txt_format:
        with open(f'file.{file_type}', 'w') as file:
            file.write(response.text)
    else:
        with open(f'file.{file_type}', 'wb') as file:
            file.write(response.content)

download(txt_format)

edited Sep 05 '18 at 08:10

answered Sep 05 '18 at 08:00

Federico Rubbi

714
3
16

You should open the file in `wb` mode and write `response.content`. – Keyur Potdar Sep 05 '18 at 08:02
Since he wants to download a _txt file_ and response.text is _str_ type it's preferred to open it in _'w'_ mode – Federico Rubbi Sep 05 '18 at 08:04
But for the xls and xml files? – Keyur Potdar Sep 05 '18 at 08:05
In that case 'wb' mode. I added those variables just to let him know how to implement it. However, I'll edit the answer thank you! – Federico Rubbi Sep 05 '18 at 08:07

How to download file from a page using python

2 Answers2