I am making a little Python script for mass-editing of HTML files (replacing links to images etc.). Now, the HTML files contain some Cyrillic, that means I have to encode the string UTF-8. I replace all the links in the HTML, and type tag.set(data) and BOOM, the console displays:
ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters.
How can I fix this? I'm pretty sure that there aren't any control characters or NULL bytes. I'm using Python 2.7.11.
value = tag.get('value').encode('utf-8')
#h = HTMLParser.HTMLParser()
#value = h.unescape(value)
urls = regex.finditer(value)
if urls is None: continue
for turl in urls:
ufile = turl.group().rsplit('/', 1)[-1]
value = value.replace(turl.group(), '/'+newsrc+'/'+ufile)
#value = cgi.escape(value, True)
value = value.replace('\0', '')
tag.set('value', value)