0

Unfortunately I cannot offer a reproducible dataset. I'm attempting to connect to an API and pull out report data from GoodData. I've been able to successfully connect and pull the report out, but occasionally it fails. There is a specific point in the script that it fails and I can't figure out why it works sometimes and not others.

connect to gd api, get temporary token

I created the below function to download the report. The function parameters are the project id within gooddata, the temporary token I received from logging in/authenticating, the file name I want it to be called, and the uri that I receive from calling the specific project and report id. the uri is like the location of the data.

uri looks something like (not real uri)..

'{"uri":"/gdc/projects/omaes11n7jpaisfd87asdfhbakjsdf87adfbkajdf/execute/raw/876dfa8f87ds6f8fd6a8ds7f6a8da8sd7f68as7d6f87af?q=as8d7f6a8sd7fas8d7fa8sd7f6a8sdf7"}'

from urllib2 import Request, urlopen
import re
import json
import pandas as pd
import os
import time

# function
def download_report(proj_id, temp_token, file_name, uri, write_to_file=True):
    headers = {
          'Accept': 'application/json',
          'Content-Type': 'application/json',
          'X-GDC-AuthTT': temp_token
        }

    uri2 = re.sub('{"uri":|}|"', '', uri)

    put_request = Request('https://secure.gooddata.com' + uri2, headers=headers)

    response = urlopen(put_request).read()

    with open(file_name + ".csv", "wb") as text_file:
        text_file.write(response)

    with open(file_name + ".csv", 'rb') as f:
        gd_data = pd.read_csv(f)

    if write_to_file:
        gd_data.to_csv(file_name + '.csv', index=False)
    return gd_data

The uri gets attached to the normal gooddata URL, along with the headers to extract the information into a text format which then gets converted into a csv/dataframe.

For some reason the dataframe is coming back just basically turning the uri into a dataframe instead of pulling the data out of the link. One last thing that I'm finding that is strange is that when I launch Spyder and try this, it fails the first time, always. If I try running it again, it will work. I don't know why. Since I'm trying to run this on a schedule its successfully running for a few days a couple times a day and then just starts failing.

Matt W.
  • 3,692
  • 2
  • 23
  • 46
  • you receive JSON and save that as CSV ? Shouldn't you use pd.read_json() instead ? – hootnot Jul 02 '18 at 15:12
  • probably, I'm new to this. But thats not the issue I'm running into, since it correctly reads in sometimes. I think its a timing issue where the downloading hasn't fully finished so I try to convert the result to a dataframe when its still the uri. – Matt W. Jul 02 '18 at 16:27
  • What does it do when you use *curl* ? You should be able to get your data too using *curl*. If that works you can exclude the url / parameters you sent / authentication issues. Are there online docs for the API request you want to make ? – hootnot Jul 02 '18 at 17:38
  • I also can recommend you to use *requests*: http://docs.python-requests.org/en/master/ instead of urllib2 – hootnot Jul 02 '18 at 18:11
  • @hootnot there are. here is the [documentation](https://help.gooddata.com/display/doc/API+Reference#/reference/dashboards-and-reporting/export-a-large-report/export-a-raw-report), also I used urllib2 because that was what the documentation showed – Matt W. Jul 02 '18 at 20:42
  • that documentation link contains a page with black *side doc*. This contains the dropbutton with example code, also for *curl*. So pls. check it out for your URL and execute it. – hootnot Jul 03 '18 at 10:08
  • I think it was the timing. It seems to be working now. I had to add `time.sleep(60)` after the the first Request, to wait for the data to download, before the next Request. – Matt W. Jul 03 '18 at 15:25
  • urlopen blocks until you got your data. That can't be the problem. If you want it to timeout use the timeout parameter. – hootnot Jul 03 '18 at 15:41

2 Answers2

1

Reason why you sometimes get URI to data result and not actual data result is that the data result is not yet ready. It sometimes takes a while to compute report. Besides the URI you also get HTTP status 202. It means that request was accepted, but result is not done yet.

Check HTTP status with getcode() method. If you get 202, request the URI again until you get 200 and then read data result.

0

First try if you get a response on curl (make sure the URL is correct)

curl \
  -H "Content-Type: application/json" \
  -H "X-GDC-AuthTT: temp_token" \

"https://secure.gooddata.com/gdc/projects/omaes11n7jpaisfd87asdfhbakjsdf87adfbkajdf/execute/raw/876dfa8f87ds6f8fd6a8ds7f6a8da8sd7f68as7d6f87af?q=as8d7f6a8sd7fas8d7fa8sd7f6a8sdf7"

hootnot
  • 1,005
  • 8
  • 13