The only reliable method that I a have found for using a script to download text from wikipedia is with cURL. So far the only way I have for doing that is to call os.system()
. Even though the output appears properly in the python shell I can't seem to the function it to return anything other than the exit code(0
). Alternately somebody could show be how to properly use urllib
.
Asked
Active
Viewed 2,025 times
0

John Fouhy
- 41,203
- 19
- 62
- 77

GameFreak
- 2,881
- 7
- 34
- 38
3 Answers
7
From Dive into Python:
import urllib
sock = urllib.urlopen("http://en.wikipedia.org/wiki/Python_(programming_language)")
htmlsource = sock.read()
sock.close()
print htmlsource
That will print out the source code for the Python Wikipedia article. I suggest you take a look at Dive into Python for more details.
Example using urllib2 from the Python Library Reference:
import urllib2
f = urllib2.urlopen('http://www.python.org/')
print f.read(100)
Edit: Also you might want to take a look at wget.
Edit2: Added urllib2 example based on S.Lott's advice

Bill the Lizard
- 398,270
- 210
- 566
- 880

Sean
- 5,244
- 6
- 28
- 27
-
Thank you, the built in help browser is almost never understandable. – GameFreak Dec 09 '08 at 01:29
-
urllib2 does almost the same thing, plus it handles things like redirects more gracefully. – S.Lott Dec 09 '08 at 01:43
-
@S.Lott I agree. I was just looking for a resource that GameFreak could learn more from, not just copy code from, and it turned out that the first resource I thought of, Dive into Python, used urllib. – Sean Dec 09 '08 at 01:53
2
Answering the question, Python has a subprocess module which allows you to interact with spawned processes.http://docs.python.org/library/subprocess.html#subprocess.Popen
It allows you to read the stdout for the invoked process, and even send items to the stdin.
however as you said urllib is a much better option. if you search stackoverflow i am sure you will find at least 10 other related questions...