As asked by @albert, I'm posting the solution I found. Using "urllib"'s "quote" methode, you can encode latin and charactes like " ", "(" and all other special characters. Since "quote" will convert "http:" to "http%3A" which is not desired, It was mandatory to split the url and only convert the wanted part. Another thing that you should consider is if the urls already partially or completely encoded, in this case, the url may contain some utf8 coded characters, which would contain "%", the quote will proceed "%" as a special character and wil convert it to "%25" which will mess the urls to non returning mess !
Example of the case:
If the url is url = "http://something/cóntaining space song name.mp3"
If the url is already partially encoded (e.g " " will be "%20"), then the current url may look like this
url = "http://something/cóntaining%20space%20song%20name.mp3"
urllib.quote(url) will give (let's assume that "http:" is not converted to "http:%3A") the urllib.quote will give:
"http://something/c%C3%B3ntaining%2520space%2520song%2520name.mp3"
The result is a mess !
With that being said; we can't split the url into "http:" and the rest of it and then apply "quote" to the second part of the url.
So the solution; Encode these special characters one by one; replace each latin or special character with its utf code. Then comes the question "How ?"
It is painful to try if each url contains a character of a list made of these characters (another thing, if the url is unicode you can't use url.find("ó")), Then here comes the tricks ! The problem is the solution !
Finding the latin and the special characters ! how to find them ?! WITH THE EXCEPTION !
If urls (containing bad characters) are of type "unicode" converting them to string will raise an exception
If the urls (containing bad characters) are of type "str" converting them to unicode will raise an exception
We find the wanted characters with the exception ;-)
Then split the url at the position of that character, quote the charcters and at the end rebuild the url.
For my case, urls are unicode:
import sys
import urllib
from core.models import Song
songs = Song.objects.all()
for song in songs:
try:
x = str(song.song_url) #will cause exception with urls containing bad characters
except(UnicodeEncodeError):
k = sys.exc_info()
pos = k[1][2] #getting the position of the bad character
c = song.song_url[pos].encode("utf8")
q = urllib.quote(c)
p1 = song.song_url[:pos] #splitted part one
p2 = song.song_url[pos+1:] #splitted part two
res = p1 + q + p2 #rebuit url
song.song_url = res
song.save()
print res
Note if the url contains several "bad" characters, the above code will treat the first one in each url, so whether execute it in a recursive manner or run it several times until you get no ouput.
I wish this helps.
Generic example where url is of type "str":
import sys
import urllib
url = "https://something.s3.amazonaws.com/music/something/thisó.mp3"
try:
x = unicode(url)
except(UnicodeDecodeError):
k = sys.exc_info()
pos = k[1][2]
url2 = url.decode('utf8')
c = url2[pos].encode("utf8")
q = urllib.quote(c)
p1 = url2[:pos]
p2 = url2[pos+1:]
res = p1 + q + p2
print res
I wish the solution is helpful for anyone who come accross.