I have written some code to scrape BTC/ETH time series from investing.com and it works fine. However I need to alter the requests call so that the downloaded data is from Kraken not the bitfinex default and from 01/06/2016 instead of the default start time. This options can be set manually on the web page but I have no idea how to send that via the requests call except that it may involve using a the "data" parameter. Grateful for any advice.
Thanks,
KM
Code already written in python and works fine for defaults
import requests
from bs4 import BeautifulSoup
import os
import numpy as np
# BTC scrape https://www.investing.com/crypto/bitcoin/btc-usd-historical-data
# ETH scrape https://www.investing.com/crypto/ethereum/eth-usd-historical-data
ticker_list = [x.strip() for x in open("F:\\System\\PVWAVE\\Crypto\\tickers.txt", "r").readlines()]
urlheader = {
"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36",
"X-Requested-With": "XMLHttpRequest"
}
print("Number of tickers: ", len(ticker_list))
for ticker in ticker_list:
print(ticker)
url = "https://www.investing.com/crypto/"+ticker+"-historical-data"
req = requests.get(url, headers=urlheader, data=payload)
soup = BeautifulSoup(req.content, "lxml")
table = soup.find('table', id="curr_table")
split_rows = table.find_all("tr")
newticker=ticker.replace('/','\\')
output_filename = "F:\\System\\PVWAVE\\Crypto\\{0}.csv".format(newticker)
os.makedirs(os.path.dirname(output_filename), exist_ok=True)
output_file = open(output_filename, 'w')
header_list = split_rows[0:1]
split_rows_rev = split_rows[:0:-1]
for row in header_list:
columns = list(row.stripped_strings)
columns = [column.replace(',','') for column in columns]
if len(columns) == 7:
output_file.write("{0}, {1}, {2}, {3}, {4}, {5}, {6} \n".format(columns[0], columns[2], columns[3], columns[4], columns[1], columns[5], columns[6]))
for row in split_rows_rev:
columns = list(row.stripped_strings)
columns = [column.replace(',','') for column in columns]
if len(columns) == 7:
output_file.write("{0}, {1}, {2}, {3}, {4}, {5}, {6} \n".format(columns[0], columns[2], columns[3], columns[4], columns[1], columns[5], columns[6]))
output_file.close()
Data is downloaded for default exchange and default date range but I want to specify Kraken and default start and end times (01/06/16 and last full day ie always yesterday)