This should be an easy one I hope. I have a url:
http://uploads4.wikiart.org/images/marc-chagall/kopeikin-and-napol%C3%A9on.jpg
that is saved into a json file with this code:
paintings = get_all_paintings(marc_chagall)
with open('chagall.json', 'w') as fb:
x = json.dump(paintings, fb)
In the file, the URL has become:
u'http://uploads4.wikiart.org/images/marc-chagall/kopeikin-and-napol\xe9on.jpg'
I am able to get the original, usable, percent-encoded URL with this code:
p = u'http://uploads4.wikiart.org/images/marc-chagall/kopeikin-and-napol\xe9on.jpg'
p = urllib.quote(p.encode('utf8'), safe='/:')
print repr(p)
> 'http://uploads4.wikiart.org/images/marc-chagall/kopeikin-and-napol%C3%A9on.jpg'
Now comes the tricky part. I want to get this string:
http://uploads4.wikiart.org/images/marc-chagall/kopeikin-and-napoléon.jpg
with the non-ascii character in napoléon intact. This is for naming purposes in the storage bucket, not for anything else. How can I produce this string?