2

I'm really new at python coding so please answer in detail and not too harsh.. I'm trying to replace the German umlaut 'ß' in an attribute table of a shapefile by 'ss' and am doing this by using the field calculator where you can add a python code block.

This is what I've tried so far:

def ecode(file, name, test):
    test.decode("utf-8")
    test.replace("\xe1", "ss")
    test.encode("utf-8")
    return test

Instead of "\xe1" I've also used "U+00DF" and "\xdf".

This error message occurs:

ERROR: ascii codec can't encode character u'\xdf' in position 11: ordinal not in range(128)

The streetname in this field of the attribute table is 'Zuccalistraße 21a', so obviously ß is the problem which is out of the ASCII range (there it >number 200). What can I do to replace it? I've searched the internet for 5 hours now....

Would love to get some answers! Kind regards, Ayla

Community
  • 1
  • 1
Ayla Lyra
  • 21
  • 3
  • Hi @AylaLyra, you need to assign the values of the function to a variable in order for the functions to work, since they are not in-place! Check my answer below :) – Devesh Kumar Singh May 16 '19 at 07:56

3 Answers3

2

decode, and encode and replace do not work in-place. Try test = test.decode('utf-8'), test = test.encode('utf-8') and test = test.replace("\xe1", "ss").

It means that the decode and replace lines have no effect on test. Then the third line tries to encode the object, but it has not been decoded, so it doesn't work.

That said, you are still going to have a problem after that. Here is what I would do :

test = test.decode("utf-8")
test = test.replace(u"\xdf", "ss")
test = test.encode("utf-8")

or

test = test.decode("utf-8")
test = test.replace(u"ß", "ss")
test = test.encode("utf-8")

whichever looks the most readable to you.

You could also not decode/encode it and just do test = test.replace(u"\xdf".encode("utf-8"), "ss") or test = test.replace("ß", "ss") but generally it's better to handle unicode objects so I would say decoding and encoding is a good practice.

Ashargin
  • 498
  • 4
  • 11
1

You can use a combination of casefold and capitalize for python3

In [6]: s = 'Zuccalistraße 21a'                                                                                                               

In [7]: s.casefold()                                                                                                                          
Out[7]: 'zuccalistrasse 21a'

In [8]: s.casefold().capitalize()                                                                                                             
Out[8]: 'Zuccalistrasse 21a'

For python2, the functions decode, replace and encode are not in-place function, but they return a value, so you need to assign the return value of the function to a variable to make your code.

Also note the # coding=utf-8 declared above. This is in accordance of PEP-263

# coding=utf-8

s = 'Zuccalistraße 21a'
s = s.decode("utf-8").replace(u"\xdf", "ss").encode("utf-8")
print(s)

The output will be

Zuccalistrasse 21a
Devesh Kumar Singh
  • 20,259
  • 5
  • 21
  • 40
  • Hi Devesh! Thanks for your help. I'll write my comment in an answer below :) – Ayla Lyra May 21 '19 at 09:04
  • If you check the answer again, I have mentioned that the first solution is `python3`, and the second solution is `python2` @AylaLyra – Devesh Kumar Singh May 21 '19 at 09:05
  • Ahh I see that's why it is not working with casefold. – Ayla Lyra May 21 '19 at 09:08
  • Great glad to help :) If the answer helped you, please consider marking the answer as accepted by clicking the tick next to the answer. Also please consider reading https://stackoverflow.com/help/someone-answers – Devesh Kumar Singh May 21 '19 at 09:09
  • It helps, really, but still I'm with the same problem :( sorry for that. The decoding is not working.. – Ayla Lyra May 21 '19 at 09:13
  • Please paste the last code as it is, it worked for me in python2.7 @AylaLyra – Devesh Kumar Singh May 21 '19 at 09:19
  • you can see it in my answer below :) – Ayla Lyra May 21 '19 at 11:21
  • You didn't paste my code entirely, because I have done the same thing you did in a simpler way @AylaLyra :) , as I have a line in my code `# coding=utf-8` , which when you paste on top and run the code, will perform the same thing. I would suggest to try my code again with this information, when you run it `python script.py`, you get the output `Zuccalistrasse 21a` – Devesh Kumar Singh May 21 '19 at 11:28
0

So obviously, it's a problem with the decoding. When I try

def ecode(file, name, test):
    test=test.decode("utf-8")
    test=test.replace(u"\xdf", "ss")
    test=test.encode("utf-8")
    return test

I get the Error message:

File "C:\Python27\ArcGIS10.2\Lib\encodings\utf_8.py", line16, in decode return codecs.utf_8_decode(input, errors, True)

UnicodeEncodeError: 'ascii' codec can't encode character u'\xdf' in position 11: ordinal not in range(128)

Now I got an answer to the problem: I found that when you enter

import sys
reload(sys)
sys.setdefaultencoding("utf8")

to the function, it works fine!! So thanks for trying to help me, have a nice day :)

Cheers, Ayla

Ayla Lyra
  • 21
  • 3
  • You didn't paste my code entirely, because I have done the same thing you did in a simpler way @AylaLyra :) , as I have a line in my code # coding=utf-8 , which when you paste on top and run the code, will perform the same thing. I would suggest to try my code again with this information, when you run it python script.py, you get the output Zuccalistrasse 21a – – Devesh Kumar Singh May 22 '19 at 12:35