I have URLs like example.com/post/alai-fm-sri-lanka-listen-online-1467/
I want to remove all URLs which have post word in them using robots.txt
So which is corrent format?
Disallow: /post-*
Disallow: /?page=post
Disallow: /*page=post
I have URLs like example.com/post/alai-fm-sri-lanka-listen-online-1467/
I want to remove all URLs which have post word in them using robots.txt
So which is corrent format?
Disallow: /post-*
Disallow: /?page=post
Disallow: /*page=post
(Note that the file has to be called robots.txt; I corrected it in your question.)
You only included one example URL, where "post" is the first path segment. If all your URLs look like that, the following robots.txt should work:
User-agent: *
Disallow: /post/
It would block the following URLs:
http://example.com/post/
http://example.com/post/foobar
http://example.com/post/foo/bar
The following URLs would still be allowed:
http://example.com/post
http://example.com/foo/post/
http://example.com/foo/bar/post
http://example.com/foo?page=post
http://example.com/foo?post=1
Googlebot and Bingbot both handle limited wildcarding, so this will work:
Disallow: /*post
Of course, that will also disallow any url that contains the words "compost", "outpost", "poster", or anything that contains the substring "post".
You could try to make it a little better. For example:
Disallow: /*/post // any segment that starts with "post"
Disallow: /*?post= // the post query parameter
Disallow: /*=post // any value that starts with "post"
Understand, though, that not all bots support wildcards, and of those that do some are buggy. Bing and Google handle them correctly. There's no guarantee if other bots do.