0

I am using an online tool to crawl my client's website and provide a list of pages / URLs that exist on it.

There is an option to exclude pages, and it gives a regex example of \?.*page=.*$

I would like to ignore everything in the news section (apart from the News page itself)

So would I go with the following?

\?.*news/.*$

1 Answers1

0

If I understand you correctly, you're looking for a regex that matches news/foo or news/foo/bar, but not news/.

You can use this regex for that: .*news/.+

.* string starts with 0 or more character(s)

news/ string includes news/

.+ string ends with 1 or more character(s)

http://regexr.com/3ffj1

jhscheer
  • 342
  • 1
  • 8
  • 18