6

So I'm using Google Custom Search (Google CSE) and I'm trying to use the refinement functionality to redirect search queries to Google Scholar.

Basically I'm following exactly the documentation found here. However it turns out that, despite there being documentation, this functionality doesn't exist, and it doesn't appear that Google has any plans to implement it in the near future (see the StackOverflow post here).

My question is, does anyone have a hack/workaround for this problem, so that I could use Google CSE to search Google Scholar?

Community
  • 1
  • 1
Set
  • 934
  • 6
  • 25

1 Answers1

2

Server Side

You can use something like https://github.com/ckreibich/scholar.py to parse the results from google scholar yourself and expose it as an API that you could consume and render any way you liked.

It would use scholar search under the hood. However, since this isn't an official API this might break at any time, it also requires you to have server side resources to service the requests, but would let you have the nicest interface that you have full control over.

IFrame

You can open an iframe at the particular URL, and this can be embedded inside your page. It looks a bit clunkier, but it means you don't have to link externally and you can embed it locally

<iframe src='http://scholar.google.com/scholar?q={query}'></iframe>

See documentation here. It might be specifically what renders well for you.

External Link

Alternatively, you can just open a new tab/window with:

<a href='http://scholar.google.com/scholar?q={query}' target='_blank'> My Link </a>
tRuEsAtM
  • 3,517
  • 6
  • 43
  • 83
Luke Exton
  • 3,506
  • 2
  • 19
  • 33
  • I guess the primary reason for using CSE is that you can do bulk searching without getting captchad or your IP banned. – Set Feb 14 '17 at 19:20
  • What sort of scale of bulk searching are you talking about? I would check that this isn't out of line with some TOS for using scholar, but this discussion seems to suggest the limit is quite high. https://github.com/ckreibich/scholar.py/issues/29 – Luke Exton Feb 14 '17 at 19:27
  • I need to perform several thousand queries per day. The 1 query per second claim without using multiple proxies is quite surprising to me. – Set Feb 14 '17 at 19:47
  • Is this coming from AWS /public hosting or a private IP range? I have heard of issues from public IP addresses – Luke Exton Feb 14 '17 at 20:02
  • private IP I think. – Set Feb 14 '17 at 21:32
  • Yeah, so you might be able to get away with more from a private IP Address block, since you don't run the risk of IP Addresses in the same subnet doing the same thing as you. Doing this with GCS appears to be in breach of section 1.4 of the terms of service for GCS https://support.google.com/customsearch/answer/1714300, depending on the exact implementation details and how you are doing the collection and rendering, but that is up to you to determine. – Luke Exton Feb 14 '17 at 22:23
  • Any automation is in breach of the ToS. Honestly I get captchad sometimes just doing a few too many normal non-automated browser searchers on Google Scholar. I'm frankly incredulous that doing any significant amount of automated scraping would be successful. – Set Feb 15 '17 at 15:10