1

i have a code which loops through list of urls to do some operations but the entered urls must each contain query string , i want to check first if the url is correct and in fact contains query strings , i searched and most of the regular expressions i found only check for the url , the closest solution i found is using urlparse like this

#!/usr/local/bin/python2.7

from urlparse import urlparse
line = "http://www.compileonlinecom/execute_python_online.php?q="
o = urlparse(line)
print o
# ParseResult(scheme='http', netloc='www.compileonlinecom',          path='/execute_python_online.php', params='', query='q=', fragment='')

if (o.scheme=='http' and o.query!=''):
print "yes , that is a url with query string  "

else:
   print "No match!!"

but i wonder if it could be done with a more solid regex

  • Did you find presented solution with urlparse not solid enough? Or you have a homework to do that by regexp? – Jan Vlcinsky May 06 '14 at 16:58
  • the first one , i think that urlparse is less optimized solution than regex , and i would still have to do a match after the parse – Mohamed abdelatty May 08 '14 at 23:20
  • Optimize for speed? You shall compare it after real measurement. If these fractional differences really matter. Optimize for correct parsing? With `regexp` you are likely to debug problems, `urlparse` already hit and resolved. Optimize for memory? Compare it - I do not think, you will find significant difference even if using both packages together. – Jan Vlcinsky May 08 '14 at 23:27

1 Answers1

0

You can try to validate it on the Question mark, as every url with a parameters should have a question mark in the url.

Example:

sites = ['site.com/index.php?id=1', "xyz.com/sf.php?df=22", "dfd.com/sdgfdg.php?ereg=1", "normalsite.com"]
for site in sites:
    if "?" in site:
         print site

Result:

site.com/index.php?id=1
xyz.com/sf.php?df=22
dfd.com/sdgfdg.php?ereg=1

You see that the site without parameters has not been printed.