1

How would I go about writing a program that can take articles from Google News and download them to my computer?

I've found that Google News already has a built in RSS feature, but I need to actually download the entire article (text and all) rather than just a headline.

Preferably, I'd like to download these articles as PDFs or HTML files, but for starters just fetching some URLs would be amazing.

There have been some questions on here about fetching articles from Google News, but nothing I've found so far has been particular helpful. Any help would be massively appreciated.

Thanks!

byang12
  • 21
  • 1
  • Have you any code to share, with what you've attempted so far, and a specific problem area? An answer explaining how to "go about writing a program" would be more involved than what StackOverflow is designed for. – William Price Sep 12 '14 at 02:51

1 Answers1

1

Legal issues aside, this is possible, see Apache HttpComponents. Here is an example (taken from here) of how to use it:

DefaultHttpClient httpclient = new DefaultHttpClient();
if ( useProxy == true ) {
    HttpHost proxy = new HttpHost(proxyStr, 80, "http");
    httpclient.getParams().setParameter(ConnRoutePNames.DEFAULT_PROXY, proxy);
}

HttpGet httpget = new HttpGet(urlStr);
httpget.addHeader("Authorization", "Basic " + encodedAuth);

HttpResponse response = httpclient.execute(httpget);

But be aware of Google TOS before you do anything like this.

Martin Capodici
  • 1,486
  • 23
  • 27