Questions tagged [google-search]

SEARCH ENGINE OPTIMIZATION(SEO) IS OFF-TOPIC. This tag is only for programming questions about the Google search engine.

Google is the most popular search engine in the world. The Google Web Search API has been deprecated in favor of the new Custom Search.

A Google search may not return answers that might be expected for reasons that include those mentioned in answers and comments to What can you NOT find on Google?:

Google does not even attempt

  • To search for a keywords that are special characters:

"Generally, punctuation is ignored, including @#$%^&*()=+[]\ and other special characters" -Franck Dernoncourt.

The search term double unary works but not --. See also Google displays my website as a spelling error.

Sites with too much content, with content of little value or that are impractical to index

May include:

  • Sites that don't have a crawlable site map and require google to provide search terms to access the results available on the site might not be fully indexed. -Josephine Bonaparte
  • Smaller blogs that aren't regularly updated are often dumped from search results. Plus anything that they think is a splog (“a blog which the author uses to promote affiliated websites” -Wikipedia). -David
  • “Most of the Twitter content is not indexed by Google, even if it’s public.
    It used to be available to Google, but that’s no longer the case since their agreement expired.” -Alex
  • “Google does not index Tumblr all that well.
    Blog posts on Tumblr are easier to find using Tumblr search.” -David
  • “everything on Google Sites isn't (or is hardly) indexed.
    If you start a Google site, get your own domain.” -David

Copyright and other protected material

May include:

  • What the government thinks is not good for you. –Hellagot
    The example give was of Germany “does not show thousands of sites … and the list increases by the thousands every year”.
  • What may infringe intellectual property rights. –einpoklum
    DMCA (Digital Millennium Copyright Act) was mentioned.
  • Census images.
    “Since the content are images that are often manually index, they usually found on paid-for sites like ancestry.com.” –amh

To see which URLs Google has been blocked from crawling, visit the Blocked URLs page of the Crawl section of Webmaster Tools.

Opt outs

  • Content explicitly disallowed by a domain's robots.txt file is excluded from the Google index. -amh

Technical complications

  • Websites that are not linked from other websites that Google already knows (perhaps from when domain was under different ownership – Tim Post). That is, there are probably a lot of websites that do not get linked from visible pages, those websites are never going to be found by the Google spider unless they're manually submitted to Google via the Webmaster Tools. –amh
  • Websites that are behind web forms that you need to fill out. –amh
  • The Deep Web “Most of the Web's information is buried far down on dynamically generated sites, and standard search engines do not find it. Traditional search engines cannot "see" or retrieve content in the deep Web—those pages do not exist until they are created dynamically as the result of a specific search. As of 2001, the deep Web was several orders of magnitude larger than the surface Web.” -Wikipedia
  • May include 408 Billion web pages saved over time according to Wayback Machine. –pnuts
1705 questions
14
votes
4 answers

Get request to Google Search

I'm trying to get HTML with search results from Google. With sending GET request for example to: https://www.google.ru/?q=1111 But if in browser all is ok, when I'm trying to use it with curl or to get source with "View source" in Google, there is…
Maximus
  • 629
  • 2
  • 7
  • 15
14
votes
3 answers

If Google's homepage is so minimal, why is the source hundreds of lines of code?

The code is minified, but reformatted is a few hundred lines of code. I'd imagine such a minimal page to have minimal code as well. What is it that Google is doing that the source is this long? I can see a lot of it is javascript, but I was under…
chrislgarry
  • 616
  • 5
  • 21
14
votes
2 answers

Google Custom Search with custom search box and button?

I am trying to make a Google custom search (I just need some sort of search engine on my site), and I need to make it so that I can use my own search box (input field). I need it to be of exact size. I also need to be able to make my own button to…
Sean
  • 141
  • 1
  • 1
  • 3
14
votes
1 answer

Should timestamps always use UTC?

Should timestamps always use UTC (as in 2012-06-14T10:32:11+00:00) and not local time (as in 2012-06-14T06:32:11-04:00 for New York)? References Although not a WordPress question, I believe it'll be a strong example -- the WordPress core, themes…
its_me
  • 10,998
  • 25
  • 82
  • 130
13
votes
2 answers

sub sitelinks in google search result

I noticed when people get the search result from google, it automatically shows the sub links under the main site link: is that possible if I want to modify the sub menus with my own site as a result in google? Which means when people search for my…
m_frog
  • 213
  • 2
  • 10
13
votes
3 answers

Is it possible for an Android app to temporarily disable "OK Google"?

I have an Android app that is used to play audio via the phone's speaker continuously, including when other apps are in the foreground, or the screen is off. There is however a problem with that app in that the audio it plays can trigger the "OK…
Mithaldu
  • 2,393
  • 19
  • 39
13
votes
2 answers

Google searches with permanent filters

I'm wondering if there's a way to make google searches where you can set filters you want to be in effect permanently - like a filter profile. So, for instance, every time you would do a search, you could get results that didn't include say, Yahoo…
ericgrosse
  • 1,490
  • 20
  • 37
13
votes
6 answers

google search with python requests library

(I've tried looking but all of the other answers seem to be using urllib2) I've just started trying to use requests, but I'm still not very clear on how to send or request something additional from the page. For example, I'll have import…
James
  • 2,635
  • 5
  • 23
  • 30
13
votes
2 answers

Google: Disable certain querystring in robots.txt

http://www.site.com/shop/maxi-dress?colourId=94&optId=694 http://www.site.com/shop/maxi-dress?colourId=94&optId=694&product_type=sale I have thousands of URLs like the above. Different combinations and names. I also have duplicates of these URLs…
TheBlackBenzKid
  • 26,324
  • 41
  • 139
  • 209
13
votes
3 answers

Google search policy

I have question about using Google search. Can I use a custom google search in my native iOS app instead of using their api tools, is this a problem for google? I write html page ( http://barzyczak.vot.pl/search.html?q=test ):
Roman Barzyczak
  • 3,785
  • 1
  • 30
  • 44
13
votes
8 answers

Is there a way to prevent Googlebot from indexing certain parts of a page?

Is it possible to fine-tune directives to Google to such an extent that it will ignore part of a page, yet still index the rest? There are a couple of different issues we've come across which would be helped by this, such as: RSS feed/news…
ConroyP
  • 40,958
  • 16
  • 80
  • 86
12
votes
2 answers

What made google search crash?

I was googling randomly and then I entered 999999..999999 which led to the linked page. Though not a crash, but since it said that it detected a huge traffic from my computer which was not the case, I used the term crash. It did mention, though as…
Harsh
  • 459
  • 1
  • 3
  • 15
12
votes
3 answers

Python - Easy way to scrape Google, download top N hits (entire .html documents) for given search?

Is there an easy way to scrape Google and write the text (just the text) of the top N (say, 1000) .html (or whatever) documents for a given search? As an example, imagine searching for the phrase "big bad wolf" and downloading just the text from the…
Georgina
  • 311
  • 4
  • 11
12
votes
2 answers

What javascript framework is google images using for pinch zoom?

Google has now implemented a very unique pinch zoom for their images. The viewport meta tag does not allow user scaling or zooming, and as you would expect the resulting content is not pinch-zoomable on a mobile touch device. The image, however, is…
wayofthefuture
  • 8,339
  • 7
  • 36
  • 53
12
votes
2 answers

Is there a google API to read cached content?

I know you can go to http://webcache.googleusercontent.com/search?q=cache:http://example.com/ to view Google's cache of any URL, but do they provide an API to hit thousands of these and pay for access? I don't want to just make HTTP GETs to these…