1

I am working on a project to identify certain text in the body of Stack Overflow questions. It works but for this one case it is not working. I am looking to see if through code I can find exposed access keys for AWS to understand the gravity of the situation. Here is the code:

headers = {'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
               'Accept-Encoding': 'gzip, deflate',
               'Accept-Language': 'en-US,en;q=0.5',
              }

url = 'https://api.stackexchange.com/2.2/search/advanced?order=desc&sort=activity&body=' + 'AKIAIHXBFL3ATI64QPAQ' + '&site=stackoverflow'

req = urllib.request.Request(url, headers=headers)
response = urllib.request.urlopen(req)
time.sleep(3)

if response.info().get('Content-Encoding') == 'gzip':
    pagedata = gzip.decompress(response.read())
elif response.info().get('Content-Encoding') == 'deflate':
    pagedata = response.read()
elif response.info().get('Content-Encoding'):
    print('Encoding type unknown')
else:
    pagedata = response.read()

soup = BeautifulSoup(pagedata, "lxml")
print(soup)

Here is the response from soup:

<html><body><p>{"items":[],"has_more":false,"quota_max":300,"quota_remaining":291}</p></body></html>

It returns and empty file. If I search for other text in the body=** parameter it does respond with a huge list of things. Am I doing something wrong or the API cannot do text search this specific?

double-beep
  • 5,031
  • 17
  • 33
  • 41
Digvijay Sawant
  • 1,049
  • 3
  • 16
  • 32

1 Answers1

1

This looks like another API bug.

A workaround is to use the q parameter instead:
    /2.2/search/advanced?q=AKIAIHXBFL3ATI64QPAQ&site=stackoverflow

This gives the same results as this live site search. (Currently 2 questions)


Note that, irregardless, this won't find answers with the target text. The API is no good for that.

SEDE can find text in both questions and answers, but the results may be up to one week old.

Brock Adams
  • 90,639
  • 22
  • 233
  • 295
  • 1
    Thank you for the response. I tried SEDE but it was very inefficient and kept timing out. Google big query has a public stackoverflow dataset which was pretty efficient but had data till March 2019 which is not bad. Thank you for the workaround. It definitely works for questions. – Digvijay Sawant Apr 29 '19 at 18:14