There is a big file. Each line of this file is URL typed in by human so there can be different problems like missing http
missing www
etc.
Is there a Python module which can repair those urls? I've tried url_fix
from werkzeug.urls
but it is not exactly what I'm searching for.
www.example.com >> http://www.example.com/
Of course that there can't be method which can repair every possible mistake but I'm looking for repairing most common mistakes.
Have you any advice?
EDIT: According to Peter Wood's comment, let's presume that URL has to contain www
. In my case, those are eshop URLs.