0

Hacker News has released an API, how do I use it in Python?

I want get all the top posts. I tried using urllib, but I don't think I am doing right.

here's my code:

import urllib2
response = urllib2.urlopen('https://hacker-news.firebaseio.com/v0/topstories.json?print=pretty')
html = response.read()
print response.read()

It just prints empty

''

I missed a line, had updated my code.

Zoe
  • 27,060
  • 21
  • 118
  • 148
shankaran
  • 13
  • 1
  • 3

2 Answers2

5

As @jonrsharpe, explained read() is only one time operation. So if you print html, you will get list of all ids. And if you go through that list, you have to make each request again to get story of each id.

First you have to convert the received data to python list and go through them all.

base_url =  'https://hacker-news.firebaseio.com/v0/item/{}.json?print=pretty'
top_story_ids = json.loads(html)
for story in top_story_ids:
    response = urllib2.urlopen(base_url.format(story))
    print response.read()

Instead of all this, you could use haxor, it's a Python wrapper for Hacker News API. Following code will fetch you all the ids of top stories :

from hackernews import HackerNews
hn = HackerNews()
top_story_ids = hn.top_stories()
# >>> top_story_ids
# [8432709, 8432616, 8433237, ...]

Then you can go through that loop and print all them, for example:

for story in top_story_ids:
   print hn.get_item(story)

Disclaimer: I wrote haxor.

avi
  • 9,292
  • 11
  • 47
  • 84
1

You should

print html

instead of

print response.read()

Why? Because the read is a one-time operation; after you've done it, you can't repeat it:

>>>import ullrib2
>>> response = urllib2.urlopen('https://hacker-news.firebaseio.com/v0/topstories.json?print=pretty')
>>> response.read()
'[ 8445087, 8444739, 8444603, 8443981, 8444976, 8443902, 8444252, 8444634, 8444931, 8444272, 8444025, 8441939, 8444510, 8444640, 8443830, 8445076, 8443470, 8444785, 8443028, 8444077, 8444832, 8443841, 8443467, 8443309, 8443187, 8443896, 8444971, 8443360, 8444601, 8443287, 8441095, 8441681, 8441055, 8442712, 8444909, 8443621, 8442596, 8443836, 8442266, 8443298, 8445122, 8443096, 8441699, 8442119, 8442965, 8440486, 8442093, 8443393, 8442067, 8444989, 8440985, 8444622, 8438728, 8442555, 8444880, 8442004, 8443185, 8444370, 8436210, 8437671, 8439641, 8443727, 8441702, 8436309, 8441041, 8437367, 8422087, 8441711, 8438063, 8444212, 8439408, 8442049, 8440989, 8439367, 8438515, 8437403, 8435278, 8442486, 8442730, 8428522, 8438904, 8443450, 8432703, 8430412, 8422928, 8443635, 8439267, 8440191, 8439560, 8437230, 8442556, 8439977, 8444140, 8441682, 8443776, 8441209, 8428632, 8441388, 8422599, 8439547 ]\n'
>>> response.read()
''

In your case, though, you've assigned the string from read to the name html, so you can still access it.


Once you have the story IDs, you can access each one via '.../v0/item/{item number}.json?print=pretty':

>>> response = urllib2.urlopen('https://hacker-news.firebaseio.com/v0/item/8445087.json?print=pretty')
>>> print response.read()
{
  "by" : "lalmachado",
  "id" : 8445087,
  "kids" : [ 8445205, 8445195, 8445173, 8445103 ],
  "score" : 21,
  "text" : "",
  "time" : 1413116430,
  "title" : "Show HN: Powerful ASCII art editor designed for the Mac",
  "type" : "story",
  "url" : "http://monodraw.helftone.com/"
}

You should read through the API documentation before continuing. It's also worth getting to grips with the json module.

jonrsharpe
  • 115,751
  • 26
  • 228
  • 437
  • I couldnt go through the response list, why? – shankaran Oct 12 '14 at 13:51
  • @shankaran I can't guess your implementation, so it's hard to say. If forced to take a punt, I would say you're probably trying to iterate over the characters in the string, rather than using e.g. `json.loads` to convert the string to a list of integers. – jonrsharpe Oct 12 '14 at 13:52
  • @shankaran I cannot guess what code you're running, so from a problem description as vague as *"couldnt [sic] go through the response list"* it's more-or-less impossible for me to say what the problem is. – jonrsharpe Oct 12 '14 at 13:59
  • @shankaran post the exact code you tried, we can see what mistakes you are doing. Without some piece of code, we cannot guess anything. Thats what jonrsharpe is trying to say. – avi Oct 12 '14 at 14:00