0

The only reliable method that I a have found for using a script to download text from wikipedia is with cURL. So far the only way I have for doing that is to call os.system(). Even though the output appears properly in the python shell I can't seem to the function it to return anything other than the exit code(0). Alternately somebody could show be how to properly use urllib.

John Fouhy
  • 41,203
  • 19
  • 62
  • 77
GameFreak
  • 2,881
  • 7
  • 34
  • 38

3 Answers3

7

From Dive into Python:

import urllib
sock = urllib.urlopen("http://en.wikipedia.org/wiki/Python_(programming_language)")
htmlsource = sock.read()
sock.close()
print htmlsource

That will print out the source code for the Python Wikipedia article. I suggest you take a look at Dive into Python for more details.

Example using urllib2 from the Python Library Reference:

import urllib2
f = urllib2.urlopen('http://www.python.org/')
print f.read(100)

Edit: Also you might want to take a look at wget.
Edit2: Added urllib2 example based on S.Lott's advice

Bill the Lizard
  • 398,270
  • 210
  • 566
  • 880
Sean
  • 5,244
  • 6
  • 28
  • 27
  • Thank you, the built in help browser is almost never understandable. – GameFreak Dec 09 '08 at 01:29
  • urllib2 does almost the same thing, plus it handles things like redirects more gracefully. – S.Lott Dec 09 '08 at 01:43
  • @S.Lott I agree. I was just looking for a resource that GameFreak could learn more from, not just copy code from, and it turned out that the first resource I thought of, Dive into Python, used urllib. – Sean Dec 09 '08 at 01:53
2

Answering the question, Python has a subprocess module which allows you to interact with spawned processes.http://docs.python.org/library/subprocess.html#subprocess.Popen

It allows you to read the stdout for the invoked process, and even send items to the stdin.

however as you said urllib is a much better option. if you search stackoverflow i am sure you will find at least 10 other related questions...

Community
  • 1
  • 1
Jake
  • 3,427
  • 2
  • 28
  • 23
0

As an alternetive to urllib, you could use the libCurl Python bindings.

gnud
  • 77,584
  • 5
  • 64
  • 78