0

I have a database with some data's like url,ip,country,state etc.

I need to crawl the url and map the values with the url.

say if i have a url http://www.google.com and the country USA, i need to map the country with the crawled data.

I tried the database crawler, it is very slow .Because i have 5 millions of url.

Is there any other option to map the database values with the crawled data by open search server.??

Thanks in advance.

Vishnu Lal
  • 189
  • 1
  • 4
  • 13

1 Answers1

1

Are you using OpenSearchServer 1.3.1 ?

If so, it is possible to use a separated index to store the location parameters (provided by the database).

In the search request, you will use the "join query" to get the data from the meta-data index in the same time than the full text search.

So you will get two indexes: 1. Index with usual columns: title, content, URL, hostname... 2. Index with meta data: country, state, ip, URL or/and hostname (used as foreign key).

In the fieldmap of the meta-data index, don't check the URL checkbox, to avoid the web crawl of the page. We just need it to do the join with the crawl index.

The indexation of 5 millions of small data using MySQL should be fast (about 10 minutes). Did you set the buffer size appropriately ? For short data a large buffer will speed up the indexation.

Another reason why it can be slow is the amount of memory allowed to OpenSearchServer. Have a look at the Runtime/System panel to check that there is enough memory available. Usually, a 5 millions index requires between 2 and 4 GB of memory.

You may also consider using the crawl cache. It let you change the index configuration and start again a crawl session without really crawling the URL. If the page is available in the crawl cache, the cache will be used.

Emmanuel Keller
  • 3,384
  • 1
  • 14
  • 16