regular expression for filtrating a url with query strings / parameters in python

Question

i have a code which loops through list of urls to do some operations but the entered urls must each contain query string , i want to check first if the url is correct and in fact contains query strings , i searched and most of the regular expressions i found only check for the url , the closest solution i found is using urlparse like this

#!/usr/local/bin/python2.7

from urlparse import urlparse
line = "http://www.compileonlinecom/execute_python_online.php?q="
o = urlparse(line)
print o
# ParseResult(scheme='http', netloc='www.compileonlinecom',          path='/execute_python_online.php', params='', query='q=', fragment='')

if (o.scheme=='http' and o.query!=''):
print "yes , that is a url with query string  "

else:
   print "No match!!"

but i wonder if it could be done with a more solid regex

Did you find presented solution with urlparse not solid enough? Or you have a homework to do that by regexp? — Jan Vlcinsky, May 06 '14 at 16:58
the first one , i think that urlparse is less optimized solution than regex , and i would still have to do a match after the parse — Mohamed abdelatty, May 08 '14 at 23:20
Optimize for speed? You shall compare it after real measurement. If these fractional differences really matter. Optimize for correct parsing? With `regexp` you are likely to debug problems, `urlparse` already hit and resolved. Optimize for memory? Compare it - I do not think, you will find significant difference even if using both packages together. — Jan Vlcinsky, May 08 '14 at 23:27

score 0 · Answer 1 · answered May 07 '14 at 02:22

0

You can try to validate it on the Question mark, as every url with a parameters should have a question mark in the url.

Example:

sites = ['site.com/index.php?id=1', "xyz.com/sf.php?df=22", "dfd.com/sdgfdg.php?ereg=1", "normalsite.com"]
for site in sites:
    if "?" in site:
         print site

Result:

site.com/index.php?id=1
xyz.com/sf.php?df=22
dfd.com/sdgfdg.php?ereg=1

You see that the site without parameters has not been printed.

answered May 07 '14 at 02:22

Ebrahim Hegazy

1

how about a url like this (http://example.com/?) , it would be positive match but it have no valid query strings – Mohamed abdelatty May 08 '14 at 23:21

regular expression for filtrating a url with query strings / parameters in python

1 Answers1