I want to be able to block web-crawlers from accessing pages other than page1.
The following should be able to block all directories/file names containing the word page. So something like /localhost/myApp/page2.xhtml should be blocked.
#Disallow: /*page
The following should enable all directories/file names containing page1 to be accessible. So something like /localhost/myApp/page1.xhtml should not be blocked.
#Allow: /*page1
The problem is crawler4j seems to ignoring the astericks which is used for wildcards. Is something wrong with my robots.txt or is the astericks something crawler4j does not interpret by default.