0

I trying to get this kind of URL http://example.com/?param=%DD%CC%C0-15 with requests python extension like this:

group = "ЭМА-15".encode('cp1251')
r = requests.get('http://example.com/?param=' + group)
r.encoding = "cp1251"

(because site works with windows-1251 (cp1251) encoding)

And getting errorat line 2: UnicodeDecodeError: 'utf8' codec can't decode byte 0xdd in position 82: invalid continuation byte But this sequence of bytes (0xDD (%DD)...) is exactly what I need. How can I fix that?

Vlad Markushin
  • 443
  • 1
  • 6
  • 24

2 Answers2

1

I guess you are trying to display cp1251 characters but your editor is configured to use utf8 The coding: cp1251 is only used by the Python interpreter to convert characters from source python files that are outside of the ASCII range. Try:

group = "ЭМА-15".decode('utf8').encode('cp1251')
r = requests.get('http://example.com/?param=' + group)
r.encoding = "cp1251"

When I run on my terminal,

>>> "ЭМА-15".decode('utf8').encode('cp1251')
'\xdd\xcc\xc0-15'
Ruhul Amin
  • 1,751
  • 15
  • 18
1

There are two things. 1. Python interpreter needs to know the encoding of "ЭМА-15" string in the source 2. query parameter is usually handled by requests but since you are constructing the URL manually, it's best to quote it by yourself.

# -*- coding: utf-8 -*-
import urllib
import requests

group = u"ЭМА-15".encode('cp1251')
param = urllib.quote_plus(group)
print(param)
r = requests.get('http://example.com/?param=' + param)

Output

%DD%CC%C0-15
Kenji Noguchi
  • 1,752
  • 2
  • 17
  • 26