Questions tagged [search-engine]

A search engine is program that searches documents for specified keywords and returns a list of the documents where the keywords were found.

A search engine is a program that searches documents for specified keywords and returns a list of the documents where the keywords were found.

Although search engine is really a general class of programs, the term is often used to specifically describe systems like Google, Yahoo!, Yandex and Excite that enable users to search for documents on the World Wide Web and USENET newsgroups.

2920 questions
21
votes
10 answers

What are some Search Servers out there?

I'm looking to find alternatives to Solr from the Apache Software Foundation. For those that don't know, Solr is an enterprise search server. A client application uses a web-services like interface to submit documents for indexing and also to…
bpapa
  • 21,409
  • 25
  • 99
  • 147
21
votes
13 answers

How would you design a good search UI?

I want to provide my users with an 'advanced' search engine. I basically have a lot of search criteria to chose from : some are very simple/common and will be largely use (ie time period, item id) some are a bit less mainstream and some won't be…
Brann
  • 31,689
  • 32
  • 113
  • 162
21
votes
2 answers

Recommendable Maven repository search engines?

mavensearch.net doesn't know current versions in many cases, mvnrepository.com is a bit more up to date but doesn't show repositories from where a package can be downloaded, what I would find very useful. What Maven respository search engines do…
deamon
  • 89,107
  • 111
  • 320
  • 448
21
votes
4 answers

Strategy for how to crawl/index frequently updated webpages?

I'm trying to build a very small, niche search engine, using Nutch to crawl specific sites. Some of the sites are news/blog sites. If I crawl, say, techcrunch.com, and store and index their frontpage or any of their main pages, then within hours my…
OdieO
  • 6,836
  • 7
  • 56
  • 88
20
votes
7 answers

Google search console fails to fetch sitemaps | "Sitemap could not be read"

I have generated a sitemap from online generators, it seems to be working and even i tested it on old google search console sitemap testor and it works. but when i submit it in both versions it just displays error message.
user9480491
20
votes
5 answers

An alternative web crawler to Nutch

I'm trying to build a specialised search engine web site that indexes a limited number of web sites. The solution I came up with is: using Nutch as the web crawler, using Solr as the search engine, the front-end and the site logic is coded with…
wassimans
  • 8,382
  • 10
  • 47
  • 58
20
votes
2 answers

ElasticSearch: search inside the array of objects

I have a problem with querying objects in array. Let's create very simple index, add a type with one field and add one document with array of objects (I use sense console): PUT /test/ PUT /test/test/_mapping { "test": { "properties": { …
Nikita
  • 4,435
  • 3
  • 24
  • 44
19
votes
5 answers

Internationalization and Search Engine Optimization

I'd like to internationalize my site such that it's accessible in many languages. The language setting will be detected in the request data automatically, and can be overridden in the user's settings / stored in the session. My question pertains to…
Matt Huggins
  • 81,398
  • 36
  • 149
  • 218
19
votes
5 answers

how to prevent staging to be indexed in search engines

I would like my staging web sites to no being indexed by search engines (Google as first). I have heard Wordpress is good at doing this but I would like to be technology agnostic. Does the robots.txt is enough ? We would like to keep anonymous…
toutpt
  • 5,145
  • 5
  • 38
  • 45
19
votes
4 answers

How does a full text search server like Sphinx work?

Can anyone explain in simple words how a full text server like Sphinx works? In plain SQL, one would use SQL queries like this to search for certain keywords in texts: select * from items where name like '%keyword%'; But in the configuration files…
0x4a6f4672
  • 27,297
  • 17
  • 103
  • 140
18
votes
1 answer

Is it possible to link directly to Google search results using href?

I would like to link directly to a search results page from a standard link. To give an example of what I'm hoping for, here is some pseudocode: Click here to search…
Frank
  • 2,050
  • 6
  • 22
  • 40
18
votes
10 answers

How does a search engine rank millions of pages within 1 second?

I understand the basics of search engine ranking, including the ideas of "reverse index", "vector space model", "cosine similarity", "PageRank", etc. However, when a user submits a popular query term, it is very likely that millions of pages…
user1036719
  • 1,036
  • 3
  • 15
  • 32
17
votes
6 answers

SOLR Permissions / Filtering Results depending on Access Rights

For example I have Documents A, B, C. User 1 must only be able to see Documents A, B. User 2 must only be able to see Document C. Is it possible to do it in SOLR without filtering by metadata? If I use metadata filter, everytime there are access…
Manny
  • 6,277
  • 3
  • 31
  • 45
17
votes
4 answers

Is it possible to control the crawl speed by robots.txt?

We can tell bots to crawl or not to crawl our website in robot.txt. On the other hand, we can control the crawling speed in Google Webmasters (how much Google bot crawls the website). I wonder if it is possible to limit the crawler activities by…
Googlebot
  • 15,159
  • 44
  • 133
  • 229
17
votes
4 answers

Ruby on Rails, How to determine if a request was made by a robot or search engine spider?

I've Rails apps, that record an IP-address from every request to specific URL, but in my IP database i've found facebook blok IP like 66.220.15.* and Google IP (i suggest it come from bot). Is there any formula to determine an IP from request was…
Agung Prasetyo
  • 4,353
  • 5
  • 29
  • 37