I am using a regex to replace quotes within in an input string. My data contains two 'types' of quotes -
" and “
There's a very subtle difference between the two. Currently, I am explicitly mentioning both these types in my regex
\"*\“*
I am afraid though that in future data I may get a different 'type' of quote on which my regex may fail. How many different types of quotes exist? Is there way to normalize these to just one type so that my regex won't break for unseen data?
Edit -
My input data consists of HTML files and I am escaping HTML entities and URLs to ASCII
escaped_line = HTMLParser.HTMLParser().unescape(urllib.unquote(line.decode('ascii','ignore')))
where line specifies each line in the HTML file. I need to 'ignore' the ASCII as all files in my database don't have the same encoding and I don't know the encoding prior to reading the file.
Edit2
I am unable to do so using replace function. I tried replace('"','') but it doesn't replace the other type of quote '“'. If I add it in another replace function it throws me NON-ASCII character error.
Condition
No external libraries allowed, only native python libraries could be used.