-1

im using crawler4J , and i want to make some patterns to urls only but i couldn't solve regex for that url :

http://www.site.com/liste/product_name_changable/productDetails.aspx?productId={id}&categoryId={category_id}

i try that :

liste\/*\/productDetails:aspx?productId=*&category_id=*

and

private final static Pattern FILTERS = Pattern.compile("^/liste/*/productDetails.aspx?productId=*$");

but it's not working.

how can i make it regex pattern ?

Muhammet Arslan
  • 975
  • 1
  • 9
  • 33

1 Answers1

1

You have several errors in your regex. All of the asterixes should be .+, to indicate that you want to match at least one or more character. The question mark symbol needs to be escaped. category_id should be categoryId. productDetails:aspx should be productDetails.aspx. With all of these fixes, the regex looks like this:

liste\/.+\/productDetails\.aspx\?productId=.+&categoryId=.+

Also, you shouldn't have ^ or $ at the start and end of the regex. Those match the start and end of the input, so they won't work if you're trying to get a portion of the url, which you are.

kabb
  • 2,474
  • 2
  • 17
  • 24