7

There is a big file. Each line of this file is URL typed in by human so there can be different problems like missing http missing www etc.

Is there a Python module which can repair those urls? I've tried url_fix from werkzeug.urls but it is not exactly what I'm searching for.

www.example.com >> http://www.example.com/

Of course that there can't be method which can repair every possible mistake but I'm looking for repairing most common mistakes.

Have you any advice?

EDIT: According to Peter Wood's comment, let's presume that URL has to contain www. In my case, those are eshop URLs.

Milano
  • 18,048
  • 37
  • 153
  • 353

0 Answers0