0

i wrote a script to get historical data from the public trades endpoint of the Kraken API, code as follows:

import pandas as pd
import json
import time
import urllib.request

def get_data(pair, since, until):
    global data
    global query
    global json_response
    global api_data
    
    data_columns= ["price", "volume", "time", "buy/sell", "market/limit", "miscellaneous"]
    data = pd.DataFrame(columns= data_columns)
    
    api_start = since
    app_start_time = time.time()
    counter = 1
    
    while api_start < until:
        last_time = time.time()
        api_domain = "https://api.kraken.com/0/public/Trades" + \
                    "?pair=%(pair)s&since=%(since)s" % {"pair":pair, "since": api_start}
        api_request = urllib.request.Request(api_domain)
        try:
            api_data = urllib.request.urlopen(api_request).read()
        except Exception:
            time.sleep(3)
        api_data = json.loads(api_data) 
        if len(api_data["error"]) != 0:
            print(api_data["error"])
            time.sleep(3)
            continue 
        query = pd.DataFrame(api_data["result"][pair], columns = data_columns)
        data = data.append(query, ignore_index= True)
        api_start = int(api_data["result"]["last"][:10])
        counter +=1
        time.sleep(1)    
        print("Request number: %s" %counter)
        print("Time since start: %s minutes" % round((time.time() - app_start_time)/60,2))
        print("Time since last request: %s seconds" % round((time.time() - last_time),2))
        print("last: %s" %api_start)
        print("")

get_data("XXBTZUSD", 1414761200, 1455761200)

After some successful responses, i get flawed responses, looking like this: Screenshot of Kraken API response

As you can see, at some point, the UNIX time stamp simply jumps from 142894080.33775 to 1654992002.801943 and thus resulting in wrong data.

Is that a problem with my code or with the API?

Thanks in advance.

petezurich
  • 9,280
  • 9
  • 43
  • 57
Dalogh
  • 11
  • 1

1 Answers1

0

Taking the liberty to simplify your code I cannot confirm your observation. I get proper timestamps.

Try this:

import requests

def get_data(pair, since):
    url = f"https://api.kraken.com/0/public/Trades?pair={pair}&since={since}"
    api_data = requests.get(url)
    api_data = json.loads(api_data.content) 
    return api_data

results = get_data("XBTUSD", 1414761200)

columns= ["price", "volume", "time", "buy/sell", "market/limit", "miscellaneous"]
df = pd.DataFrame(results["result"]["XXBTZUSD"], columns=columns)
df.time = df.time.astype(int)
df.head()

Print out:

    price   volume  time    buy/sell    market/limit    miscellaneous
0   340.09209   0.02722956  1414815678  s   m   
1   340.15346   0.21604000  1414820304  s   m   
2   340.00000   0.03395999  1414820304  s   m   
3   340.00001   0.01000000  1414821818  s   l   
4   340.00000   0.25668009  1414821818  s   l   

Edit:

Using pagination I can confirm the jump in timestamps. The problem very likely lies with the API.

def get_data(pair, since):
    url = f"https://api.kraken.com/0/public/Trades?pair={pair}&since={since}"
    api_data = requests.get(url)
    api_data = json.loads(api_data.content) 
    return api_data

start_ts = 1414761200

frames = []

for _ in range(30):
    print(start_ts)
    print(datetime.fromtimestamp(int(start_ts)))
    tmp = get_data("XBTUSD", start_ts)
    start_ts = tmp["result"]["last"][:10]
    frames.append(pd.DataFrame(results["result"]["XXBTZUSD"]))
    time.sleep(2)

Print out after a couple of iterations:

1438313128
2015-07-31 05:25:28
1653648031
2022-05-27 12:40:31
petezurich
  • 9,280
  • 9
  • 43
  • 57
  • Thanks for your answer. For one request like you did, I also get proper timestamps. However, as I want to get more than 1000 datapoints, i use pagination. After like 10 requests or so (10*1000 = 10.000 datapoints), i get wrong timestamps and messed up data. – Dalogh Jun 25 '22 at 12:20
  • Can you add the pagination to your code to make this reproducible? I don't see that in Kraken's API documentation. – petezurich Jun 25 '22 at 12:33
  • It's this line, you can also find it in the code above: `api_start = int(api_data["result"]["last"][:10])` it saves the "last" parameter for the next request. you can find the documentation under responses -> result -> last: https://docs.kraken.com/rest/#operation/getRecentTrades – Dalogh Jun 25 '22 at 13:10
  • Tried it with pagination. Now I get the same jump in the timestamps. The problem seems to be with the API. I don't see any other sensible conclusion. – petezurich Jun 25 '22 at 13:58
  • One more observation – if I start with timestamp `1500000000` I get e.g. 30k results without the jump. Maybe there is a data gap for very old data (you start in 2014) which results in the observed behaviour. – petezurich Jun 25 '22 at 14:04
  • 1
    Glad that you at least have the same problem. I however don't think that there is a data gap as the jumps don't always occur at the same point in time, but random. Sometimes after 10 requests, sometimes after 30. I will contact Kraken and ask if they know this problem. Thanks for your help! – Dalogh Jun 25 '22 at 14:41
  • You're welcome. And good luck! Fingers crossed that you can solve this. – petezurich Jun 25 '22 at 15:18