3
>>> dir = u'\\\\nas\\cut\\'
>>> cutter = "seleção"
>>> ext = ".cf2"
>>> path = dir+cutter+ext

Traceback (most recent call last):
  File "<pyshell#8>", line 1, in <module>
    path = dir+cutter+ext
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe7 in position 4: ordinal not in range(128)

This is needed to use:

f = open(dir, 'r')

I don't know how I concatenate this well. The variable dir must be in unicode because I use configparser to parse te value from a .ini file and that comes with unicode encoding.

Ricardo Reis
  • 69
  • 1
  • 10

2 Answers2

4

Decode your bytes to a unicode string, explicitly:

path = dir + cutter.decode('utf8') + ext.decode('utf8')

Note that you should really use the os.path.join() function to build paths:

path = os.path.join(dir, cutter.decode('utf8') + ext.decode('utf8'))

This assumes you know your terminal or console is configured for UTF-8; it is better to use sys.stdin.encoding here. For data sourced from elsewhere, determine the codec for that source first.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
0

If your filename and extension are constants like your example, just use Unicode strings for everything:

>>> dir = u'\\\\nas\\cut\\'
>>> cutter = u"seleção"
>>> ext = u".cf2"
>>> path = dir+cutter+ext

If not constants and they are byte strings, .decode() them with an appropriate encoding. What that encoding is will be OS-dependent.

Note that some APIs like os.listdir() and glob.glob() can take a Unicode argument and return Unicode strings.

Mark Tolonen
  • 166,664
  • 26
  • 169
  • 251