33

Google/GMail/etc. doesn't offer partial or prefix search (e.g. stuff*) though it could be very useful. Often I don't find a mail in GMail, because I don't remember the exact expression.

I know there is stemming and such, but it's not the same, especially if we talk about languages other than English.

Why doesn't Google add such a feature? Is it because the index would explode? But databases offer partial search, so surely there are good algorithms to tackle this problem.

What is the problem here?

6 Answers6

8

Google doesn't actually store the text that it searches. It stores search terms, links to the page, and where in the page the term exists. That data structure is indexed in the traditional database sense. I'd bet using wildcards would make the index of the index pretty slow and as Developer Art says, not very useful.

pts
  • 80,836
  • 20
  • 110
  • 183
Byron Whitlock
  • 52,691
  • 28
  • 123
  • 168
  • 3
    I've found a search engine that can do prefix (stuff*) and proximity search! [Exalead Web Search](http://www.exalead.com/search/). Click Advanced Search to find these options. – Hugh Brackett Feb 03 '11 at 17:53
  • 2
    @HughBrackett Thanks for the hint, but I am afraid Exalead have meanwhile removed the Prefix search from their options even in Advanced search. :-( It is nowhere to be found. – syntaxerror Mar 19 '15 at 17:46
8

Google does search partial words. Gmail does not though. Since you ask what's the problem here, my answer is lack of effort. This problem has a solution that enables to search in constant time and linear space but not very cache friendly: Suffix Trees. Suffix Arrays is another option that is more cache-friendly and still time efficient.

Rui Ferreira
  • 623
  • 1
  • 5
  • 8
  • An alternative to Suffix Trees is N-Grams. Which are performant just not storage efficient. But a solution nonetheless. – Cody Caughlan Dec 03 '09 at 01:22
  • 1
    Another alternative is to stop using Gmail and going back using Outlook. :-) – Marco Demaio Oct 22 '12 at 16:15
  • 1
    Though Google does search partial __words__, it won't match on partial __numbers__! That must be distinguished. If you, for instance look for a partial serial number, you are likely to not get any relevant results at all. I've now tried that enough times to believe this would not work. – syntaxerror Mar 19 '15 at 17:33
  • *"Google does search partial words."* it does not, AFAIK. It searches synonyms, though. – nitely Jun 11 '16 at 21:50
  • If I do a Google search for "locat" my results include "locations" and "locator". It does not say "did you mean locate?", nor does it show the message "showing results for locate. See results for locat instead" like it does if you have an obvious typo (indeed, the first few results are matches for an acronym "LOCAT"). It really looks like it's matching words that begin with "locat". How would we test to determine if it's searching partial words vs synonyms? – bobpaul Aug 22 '17 at 22:22
6

It is possible via the Google Docs - follow this article:

http://www.labnol.org/internet/advanced-gmail-search/21623/

pbaranski
  • 22,778
  • 19
  • 100
  • 117
  • 1
    Weird solution, but it *does* work. Not for daily use, but can be very helpful when searching for specific regex pattern (use label `all`). – DinGODzilla Jan 15 '17 at 19:45
3

Google Code Search can search based on regular expressions, so they do know how to do it. Of course, the amount of data Code Search has to index is tiny compared to the web search. Using regex or wildcard search in the web search would increase index size and decrease performance to impractical levels.

interjay
  • 107,303
  • 21
  • 270
  • 254
0

The secret to finding anything in Google is to enter a combination of search terms (or quoted phrases) that are very likely to be in the content you are looking for, but unlikely to appear together in unrelated content. A wildcard expression does the opposite of this. Just enter the terms you expect the wildcard to match, keeping in mind that Google will do stemming for you. Back in the days when computers ran on steam, Lycos (iirc) had pattern matching, but they turned it off several years ago. I presume it was putting too much load on their servers.

Hugh Brackett
  • 2,706
  • 14
  • 21
-1

Because you can't sensibly derive what is meant with car*:

Cars? Carpets? Carrots?

Google's algorithms compare document texts, also external inbound links to determine what a document is about. With these wildcards all these algorithms go into junk

  • 23
    It should return all results in this case. The user wants it, the user gets it. –  Dec 02 '09 at 19:16
  • I suppose it could be done technically but for most humans it would probably make no sense. Maybe submit a request to Google. Who knows, maybe it's a great idea they simply missed? –  Dec 02 '09 at 19:17