1

I have a long-standing issue with urllib.request. What I do:

wahlrecht = urllib.parse.quote("http://www.wahlrecht.de/umfragen/")
page = urllib.request.urlopen(url)

Here's the full traceback I get:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/d069049/Documents/BCCN/Predictor/Backend/wahlrecht_polling_firms.py", line 72, in get_tables
    page = urllib.request.urlopen(wahlrecht)
  File "/Users/d069049/anaconda/envs/bccn2017/lib/python3.6/urllib/request.py", line 223, in urlopen
    return opener.open(url, data, timeout)
  File "/Users/d069049/anaconda/envs/bccn2017/lib/python3.6/urllib/request.py", line 511, in open
    req = Request(fullurl, data)
  File "/Users/d069049/anaconda/envs/bccn2017/lib/python3.6/urllib/request.py", line 329, in __init__
    self.full_url = url
  File "/Users/d069049/anaconda/envs/bccn2017/lib/python3.6/urllib/request.py", line 355, in full_url
    self._parse()
  File "/Users/d069049/anaconda/envs/bccn2017/lib/python3.6/urllib/request.py", line 384, in _parse
    raise ValueError("unknown url type: %r" % self.full_url)
ValueError: unknown url type: 'http%3A//www.wahlrecht.de/umfragen/'
ben0it8
  • 505
  • 1
  • 6
  • 10
  • 1
    Well, uh... why are you quoting the url before opening it? Of course that would make it invalid... – Aran-Fey May 30 '17 at 09:20

2 Answers2

1

You're parsing the : as if its args, you need to open the URL without quoting it. Otherwise you try to open http%3A the following line creates your problem

wahlrecht = urllib.parse.quote("http://www.wahlrecht.de/umfragen/")

If you change it to

wahlrecht = "http://www.wahlrecht.de/umfragen/"

it should work

Isdj
  • 1,835
  • 1
  • 18
  • 36
0

urllib.parse.quote() has a default parameter safe='/', which "specifies additional ASCII characters that should not be quoted". In your url

"http://www.wahlrecht.de/umfragen/",

the ":" after "http" is quoted and replaced by "%3A", so it becomes

"http%3A//www.wahlrecht.de/umfragen/",

which causes the error. You can add ":" to the safe parameter to avoid it being quoted. For example, use urllib.parse.quote("http://www.wahlrecht.de/umfragen/",safe=':/').

Yue Zhao
  • 154
  • 1
  • 3
  • 9