Questions tagged [sitemap]

20 questions
11
votes
4 answers

How does Wikipedia generate its Sitemap?

The topic interests me because of Wikipedia's size. It may be easy to create some crons to update the sitemaps periodically in a small site, but what about a big one? So: How does Wikipedia generate its Sitemap?
user10608
6
votes
4 answers

How can I protected my sitemap index file and sitemap.xml files from leechers?

I have a "content" website that some leechers and 419 scammers love to crawl agressively which also generates costs and performance issue. :( I have no choice: I need to prevent them to access the sitemap files and index. :( I am doing the same as…
Toto
  • 293
  • 1
  • 4
  • 12
4
votes
1 answer

how to force Nginx to override header?

I'm trying to display my sitemaps. Browsers display my sitemap index as xml but treat post sitemaps as plain text. I tried to override content type with below configuration but it didn't help. location ~ \.xml$ { proxy_hide_header…
3
votes
2 answers

How many types of sitemaps are there?

I was perplexed to find two different sitemaps in Google sites: http://sites.google.com/site/(name of the site)/system/feeds/sitemap http://sites.google.com/site/(name of the site)/system/app/pages/sitemap/hierarchy Now, I am ready ask the…
user10608
2
votes
0 answers

How to use sitemap.xml to create a static mirror of a CMS

Is there a tool to create a static mirror of a content management system (CMS) that provides a sitemap.xml file? Ideally, I would point a tool like wget or curl to a sitemap.xml file and have it automatically sync the static directories using the…
Lee Joramo
  • 21
  • 1
2
votes
2 answers

.htaccess - Redirect all URLs with one exception

I want to redirect all URLs from one domain to another. Some of the old URLs have new counterparts with specific pages to redirect to. All other URLs should redirect to the homepage of the new domain. But I don't want to rediret the sitemap.xml. So…
Cray
  • 135
  • 5
2
votes
1 answer

How do I exclude my sitemap from httacess redirect?

I want all urls to be redirected, except my sitemap xml file in the root directory. The htaccess should allow https://old-domain/xml.xml to resolve with HTTP 200, but it is still redirecting to the new domain at the moment. How can I exclude the…
Till Noah
  • 21
  • 2
1
vote
3 answers

Can I protect my sitemap.xml so that only searchengines can download it?

I'm planning of adding a bunch of aggregated lists of pages in my sitemaps that I don't want make it too easy for outsiders to screnscrape. Can I protect my sitemap.xml so that only searchengines can download it? Install a firewall? I'm using IIS6.…
Niels Bosma
  • 243
  • 1
  • 4
  • 15
1
vote
0 answers

google sitemap generator : only the default hostname is listed

I successfuly installed the google sitemap generator on a kimsufi server, running on debian with apache2.2. But when I go to http://example.com:8181, only the default hostname is listed, so I can't configure the other hosted website. I installed…
Snyf
  • 111
  • 1
1
vote
1 answer

in sitemap after moving from apache to nginx

I have a sitemap named http://www.domain.com/sitemap1.php. It starts with this code:
1
vote
1 answer

Store sitemaps off-site

We got Nginx webserver. And sitemaps that we generate every week or so ... We migrated to multiple web-servers under single load-balancer lately, and keeping a sitemaps on every webserver seem kinda silly. As we are on AWS, is there a way to store…
Katafalkas
  • 523
  • 2
  • 8
  • 20
1
vote
1 answer

Multilanguage google sitemap

Masters, We translated our site to english and im little bit confused about sitemap.xml. Till now, we have a sitemap like this:
holian
  • 227
  • 1
  • 8
  • 14
1
vote
3 answers

Unable to generate a sitemap by Google's generator

I would like to generate a sitemap by my Uni.s account such that I have a cron which continuously run the sitemap_gen.py -file. The sitemap is for my site at Google Sites and particularly for the users of the site, not only for search engines. How…
0
votes
3 answers

Why google didn't crawl all stuff in sitemap.xml?

There are 3000 entries inside sitemap.xml,but it turns out that Google just crawls 300 of them,what's the problem?
Mask
0
votes
2 answers

Google mini ignoring sitemap

I'm in the processing of setting up a Google Mini device to index our site which has a lot of dynamically generated content. I've created a dynamic site.map file which lists all of the dynamic URL's. This is currently being indexed by Google but…
Dave Barker
  • 111
  • 2
1
2