4

I'm using they python requests library for the first time, and I am confused. When I run the function below in a for loop, with different base URLs, it appears to receive a response, but the content that is returned is the same for both URLs.

If I look at the API URL in my browser, then I see that it's the content for the first URL that's being returned both times. What am I missing?

base_urls = ['http://kaweki.wikia.com/','http://solarmovie.wikia.com/']


def getEdits(wikiObj, limit=500):     
    payload = {'limit': limit}                             
    r = requests.get('{}api/v1/Activity/LatestActivity'.format(wikiObj),
                     params=payload)
    edits = r.json()
    return edits['items']

for url in base_urls:
    print getEdits(url)  
Jeremy
  • 1,960
  • 4
  • 21
  • 42
  • 1
    I looked it over and can't see anything that's obviously problematic. Assuming both of the destination urls behave the same and have different content, then I can't explain why the results would appear the same. – Andrew Gorcester Mar 05 '15 at 20:58
  • 1
    Could it be a bug on their side? Maybe it's caching where it shouldn't? – Jeremy Mar 05 '15 at 20:59
  • 1
    I doubt it's caching that's the problem exactly, but I agree that it's totally possible the API is broken. It may also simply be hard to use, returning irrelevant results because of a non-obvious usage detail. – Andrew Gorcester Mar 05 '15 at 21:06
  • 1
    It's not only Python, it's the same when you place the request using `curl`. Funny enough, when you run the requests in your browser (chrome) you **will* get different results. I tried adding the header `'Cache-Control': 'no-cache'` to the request but it didn't solve it. In order to further debug we need to see the server-side apache logs to see why it treats it as similar requests when it's done through code/curl and why it treats it as different requests when they're placed through a browser. – Nir Alfasi Mar 05 '15 at 21:13
  • @Jeremy problem solved - see more details in my answer below. – Nir Alfasi Mar 05 '15 at 22:02

3 Answers3

4

There is a bug on the server side which ignores cache-control headers and such for a period of time.

Introducing sleep of 5 secs (maybe even shorter periods) works around the bug. I've marked the lines that were added below:


import requests
import json
from time import sleep #ADDED

base_urls = ['http://kaweki.wikia.com/', 'http://solarmovie.wikia.com/']


def getEdits(wikiObj, limit=500):       
    payload = {'limit': limit}   
    url = '{}api/v1/Activity/LatestActivity'.format(wikiObj)
    r = requests.get(url, params=payload) 
    edits = json.loads(r.content)
    return edits['items']

for url in base_urls:    
    print getEdits(url)  
    sleep(5) # ADDED

OUTPUT

[{u'article': 1461, u'revisionId': 14, u'user': 26127114, u'timestamp': 1424389645}, {u'article': 1461, u'revisionId': 13, u'user': 26127114, u'timestamp': 1424389322}, {u'article': 1461, u'revisionId': 12, u'user': 26127114, u'timestamp': 1424389172}, {u'article': 1461, u'revisionId': 5, u'user': 26127114, u'timestamp': 1424388924}]
[{u'article': 1461, u'revisionId': 14, u'user': 26127165, u'timestamp': 1424389107}, {u'article': 1461, u'revisionId': 7, u'user': 26127165, u'timestamp': 1424388706}]
Nir Alfasi
  • 53,191
  • 11
  • 86
  • 129
  • 1
    Hmm, very interesting. But I think the `Host:` header might be a red herring. After all, `requests` already should add that by itself. And it does in fact seem to work with just the 5s delay. – Lukas Graf Mar 05 '15 at 22:28
  • @LukasGraf yes, you are right indeed! I'll update the answer - thanks! – Nir Alfasi Mar 05 '15 at 22:30
3

The API endpoints are "broken". Refreshing the two endpoints in a browser repeatedly has them switching back and forth between two responses. You can replicate it by making refreshing one request half a dozen times, and then refreshing the other request half a dozen times and switching back and forth every half a dozen requests.

Request A:

http://solarmovie.wikia.com/api/v1/Activity/LatestActivity

Request B:

http://kaweki.wikia.com/api/v1/Activity/LatestActivity

Response 1:

{
    items: [
        {
            article: 1461,
            user: 26127114,
            revisionId: 14,
            timestamp: 1424389645
        },
        {
            article: 1461,
            user: 26127114,
            revisionId: 13,
            timestamp: 1424389322
        },
        {
            article: 1461,
            user: 26127114,
            revisionId: 12,
            timestamp: 1424389172
        },
        {
            article: 1461,
            user: 26127114,
            revisionId: 5,
            timestamp: 1424388924
        }
    ],
    basepath: "http://kaweki.wikia.com"
}

Response 2:

{
    items: [
        {
            article: 1461,
            user: 26127165,
            revisionId: 14,
            timestamp: 1424389107
        },
        {
            article: 1461,
            user: 26127165,
            revisionId: 7,
            timestamp: 1424388706
        }
    ],
    basepath: "http://solarmovie.wikia.com"
}
abraham
  • 46,583
  • 10
  • 100
  • 152
  • While it's true it's not really helpful/useful! Further, can you reproduce the browser behavior with curl/python? – Nir Alfasi Mar 05 '15 at 21:23
  • 2
    I've sent an email to their API team to let them know that it looks like there's a bug. Thanks so much! – Jeremy Mar 05 '15 at 21:24
  • 1
    @alfasin the question has the Python that originally showed the results. How is answer the question not useful? – abraham Mar 05 '15 at 21:25
  • I experienced the same behavior during testing this. So I think the correct answer here is in fact "this API is broken". – Lukas Graf Mar 05 '15 at 21:41
  • 1
    BTW: All the subdomains on `wikia.com` are just aliases (`CNAME` records) for `wikia.com`. So the distinction between different wikis has to happen via some sort of named virtual hosting based on the `Host:` header, which the API or some caching proxy in between seems to mess up. – Lukas Graf Mar 05 '15 at 21:43
  • I believe that @LukasGraf is correct. Something weird though: if I hardcode the Host header (i.e. `'Host': 'kaweki.wikia.com'`) - I'll get a different answer per request, but when I extract the host from `wikiObj` the bug is back. There's something messy here which I believe is both a combination of python dynamic binding with conjunction to the subdomain configuration on the server-side. – Nir Alfasi Mar 05 '15 at 21:49
  • @abraham The OP showed a problem and you said - "hey there's a problem" and then you go into more details that shows how to repro the problem on a broaswer - but you didn't pinpoint the issue and introduced a way to fix/work-around it. That's why I wrote it's not useful. – Nir Alfasi Mar 05 '15 at 21:51
0

I downloaded and ran the script and got apparently identical output. There doesn't seem to be anything wrong with the script, though! I think the output is simply identical, for some reason. Try to change return edits['items'] to just return edits and you'll see that the output is different in that case. If there really is a bug in the code, that should help you isolate it; if not, then maybe you can figure out why the real output is like that.

Andrew Gorcester
  • 19,595
  • 7
  • 57
  • 73