How to crawl entire wikimapia?

Question

i need a sitemap which can help to people and google to know pages as well. I've tried WebSphinx application.

I realize if I put wikipedia.org as the starting URL, it will not crawl further.

Hence, how to actually crawl the entire Wikipedia? Can anyone gimme some guidelines? Do I need to specifically go and find those URLs and put multiple starting URLs?

Anyone has suggestions of good website with the tutorial on usng WebSphinx's API?

A sitemap is a map of your own site – Daniel F Jul 02 '15 at 19:20 — Daniel F, Jul 02 '15 at 19:20

score 0 · Answer 1 · answered Jul 10 '15 at 23:59

0

Crawling wikipedia is a bad idea. It is hundreds of TBs of data uncompressed. I would suggest offline crawling by using various dumps provided by wikipedia. Find them here https://dumps.wikimedia.org/

You can create a sitemap for wikipedia using page meta information, external links, interwikilinks and redirects databases to name a few.

answered Jul 10 '15 at 23:59

Shreyas Chavan

1,079
1
7
17

wikimapia not wikipedia – panchicore Feb 11 '18 at 16:13

How to crawl entire wikimapia?

1 Answers1