4

I'm trying to get some datas from Google Webmaster Tool (GWT), I have searched some of the API Documents and Implements, But they are returning few of the datas only from the GWT.

My Needs :

Needs to get the datas of the following from GWT,

(1). TOP_PAGES

(2). TOP_QUERIES

(3). CRAWL_ERRORS

(4). CONTENT_ERRORS

(5). CONTENT_KEYWORDS

(6). INTERNAL_LINKS

(7). EXTERNAL_LINKS

(8). SOCIAL_ACTIVITY

After getting these datas, i need to generate the Excel file for each of them.

Achieved :

I have got few datas from the above and generated into the Excel file.such as,

(1). TOP_PAGES

(2). TOP_QUERIES

(3). INTERNAL_LINKS

(4). EXTERNAL_LINKS

(5). CONTENT_KEYWORDS

Not Achieved :

Still I'm not getting the major parts / datas like,

(1). CRAWL_ERRORS

(2). CONTENT_ERRORS

(3). SOCIAL_ACTIVITY

Code Samples For Your Reference :

I have used two files in PHP for this GWT API,

File #1 : ( gwdata.php )

 <?php
    /**
     *  PHP class for downloading CSV files from Google Webmaster Tools.
     *
     *  This class does NOT require the Zend gdata package be installed
     *  in order to run.
     *
     *  Copyright 2012 eyecatchUp UG. All Rights Reserved.
     *
     *  Licensed under the Apache License, Version 2.0 (the "License");
     *  you may not use this file except in compliance with the License.
     *  You may obtain a copy of the License at
     *
     *     http://www.apache.org/licenses/LICENSE-2.0
     *
     *  Unless required by applicable law or agreed to in writing, software
     *  distributed under the License is distributed on an "AS IS" BASIS,
     *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
     *  See the License for the specific language governing permissions and
     *  limitations under the License.
     *
     *  @author: Stephan Schmitz <eyecatchup@gmail.com>
     *  @link:   https://code.google.com/p/php-webmaster-tools-downloads/
     */

     class GWTdata
     {
        const HOST = "https://www.google.com";
        const SERVICEURI = "/webmasters/tools/";

        public $_language, $_tables, $_daterange, $_downloaded, $_skipped;
        private $_auth, $_logged_in;

        public function __construct()
        {
            $this->_auth = false;
            $this->_logged_in = false;
            $this->_language = "en";
            $this->_daterange = array("","");
            $this->_tables = array("TOP_PAGES", "TOP_QUERIES",
                "CRAWL_ERRORS", "CONTENT_ERRORS", "CONTENT_KEYWORDS",
                "INTERNAL_LINKS", "EXTERNAL_LINKS", "SOCIAL_ACTIVITY"
            );
            $this->_errTablesSort = array(0 => "http",
                1 => "not-found", 2 => "restricted-by-robotsTxt",
                3 => "unreachable", 4 => "timeout", 5 => "not-followed",
                "kAppErrorSoft-404s" => "soft404", "sitemap" => "in-sitemaps"
            );
            $this->_errTablesType = array(0 => "web-crawl-errors",
                1 => "mobile-wml-xhtml-errors", 2 => "mobile-chtml-errors",
                3 => "mobile-operator-errors", 4 => "news-crawl-errors"
            );
            $this->_downloaded = array();
            $this->_skipped = array();
        }

        /**
         *  Sets content language.
         *
         *  @param $str     String   Valid ISO 639-1 language code, supported by Google.
         */
            public function SetLanguage($str)
            {
                $this->_language = $str;
            }

        /**
         *  Sets features that should be downloaded.
         *
         *  @param $arr     Array   Valid array values are:
         *                          "TOP_PAGES", "TOP_QUERIES", "CRAWL_ERRORS", "CONTENT_ERRORS",
         *                          "CONTENT_KEYWORDS", "INTERNAL_LINKS", "EXTERNAL_LINKS",
         *                          "SOCIAL_ACTIVITY".
         */
            public function SetTables($arr)
            {
                if(is_array($arr) && !empty($arr) && sizeof($arr) <= 2) {
                    $valid = array("TOP_PAGES","TOP_QUERIES","CRAWL_ERRORS","CONTENT_ERRORS",
                      "CONTENT_KEYWORDS","INTERNAL_LINKS","EXTERNAL_LINKS","SOCIAL_ACTIVITY");
                    $this->_tables = array();
                    for($i=0; $i < sizeof($arr); $i++) {
                        if(in_array($arr[$i], $valid)) {
                            array_push($this->_tables, $arr[$i]);
                        } else { throw new Exception("Invalid argument given."); }
                    }
                } else { throw new Exception("Invalid argument given."); }
            }

        /**
         *  Sets daterange for download data.
         *
         *  @param $arr     Array   Array containing two ISO 8601 formatted date strings.
         */
            public function SetDaterange($arr)
            {
                if(is_array($arr) && !empty($arr) && sizeof($arr) == 2) {
                    if(self::IsISO8601($arr[0]) === true &&
                      self::IsISO8601($arr[1]) === true) {
                        $this->_daterange = array(str_replace("-", "", $arr[0]),
                          str_replace("-", "", $arr[1]));
                        return true;
                    } else { throw new Exception("Invalid argument given."); }
                } else { throw new Exception("Invalid argument given."); }
            }

        /**
         *  Returns array of downloaded filenames.
         *
         *  @return  Array   Array of filenames that have been written to disk.
         */
            public function GetDownloadedFiles()
            {
                return $this->_downloaded;
            }

        /**
         *  Returns array of downloaded filenames.
         *
         *  @return  Array   Array of filenames that have been written to disk.
         */
            public function GetSkippedFiles()
            {
                return $this->_skipped;
            }

        /**
         *  Checks if client has logged into their Google account yet.
         *
         *  @return Boolean  Returns true if logged in, or false if not.
         */
            private function IsLoggedIn()
            {
                return $this->_logged_in;
            }

        /**
         *  Attempts to log into the specified Google account.
         *
         *  @param $email  String   User's Google email address.
         *  @param $pwd    String   Password for Google account.
         *  @return Boolean  Returns true when Authentication was successful,
         *                   else false.
         */
            public function LogIn($email, $pwd)
            {
                $url = self::HOST . "/accounts/ClientLogin";
                $postRequest = array(
                    'accountType' => 'HOSTED_OR_GOOGLE',
                    'Email' => $email,
                    'Passwd' => $pwd,
                    'service' => "sitemaps",
                    'source' => "Google-WMTdownloadscript-0.1-php"
                );
                $ch = curl_init();
                curl_setopt($ch, CURLOPT_URL, $url);
                curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
                curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 30);
                curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
                curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
                curl_setopt($ch, CURLOPT_POST, true);
                curl_setopt($ch, CURLOPT_POSTFIELDS, $postRequest);
                $output = curl_exec($ch);
                $info = curl_getinfo($ch);
                curl_close($ch);
                if($info['http_code'] == 200) {
                    preg_match('/Auth=(.*)/', $output, $match);
                    if(isset($match[1])) {
                        $this->_auth = $match[1];
                        $this->_logged_in = true;
                        return true;
                    } else { return false; }
                } else { return false; }
            }

        /**
         *  Attempts authenticated GET Request.
         *
         *  @param $url    String   URL for the GET request.
         *  @return Mixed  Curl result as String,
         *                 or false (Boolean) when Authentication fails.
         */
            public function GetData($url)
            {
                if(self::IsLoggedIn() === true) {
                    $url = self::HOST . $url;
                    $head = array("Authorization: GoogleLogin auth=".$this->_auth,
                        "GData-Version: 2");
                    $ch = curl_init();
                    curl_setopt($ch, CURLOPT_URL, $url);
                    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
                    curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 30);
                    curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
                    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
                    curl_setopt($ch, CURLOPT_ENCODING, true);
                    curl_setopt($ch, CURLOPT_HTTPHEADER, $head);
                    $result = curl_exec($ch);
                    $info = curl_getinfo($ch);
                    curl_close($ch);
                    return ($info['http_code']!=200) ? false : $result;
                } else { return false; }
            }

        /**
         *  Gets all available sites from Google Webmaster Tools account.
         *
         *  @return Mixed  Array with all site URLs registered in GWT account,
         *                 or false (Boolean) if request failed.
         */
            public function GetSites()
            {
                if(self::IsLoggedIn() === true) {
                    $feed = self::GetData(self::SERVICEURI."feeds/sites/");
                    if($feed !== false) {
                        $sites = array();
                        $doc = new DOMDocument();
                        $doc->loadXML($feed);
                        foreach ($doc->getElementsByTagName('entry') as $node) {
                            array_push($sites,
                              $node->getElementsByTagName('title')->item(0)->nodeValue);
                        }
                        return $sites;
                    } else { return false; }
                } else { return false; }
            }

        /**
         *  Gets the download links for an available site
         *  from the Google Webmaster Tools account.
         *
         *  @param $url    String   Site URL registered in GWT.
         *  @return Mixed  Array with keys TOP_PAGES and TOP_QUERIES,
         *                 or false (Boolean) when Authentication fails.
         */
            public function GetDownloadUrls($url)
            {
                if(self::IsLoggedIn() === true) {
                    $_url = sprintf(self::SERVICEURI."downloads-list?hl=%s&siteUrl=%s",
                      $this->_language,
                      urlencode($url));
                    $downloadList = self::GetData($_url);
                    return json_decode($downloadList, true);
                } else { return false; }
            }

        /**
         *  Downloads the file based on the given URL.
         *
         *  @param $site    String   Site URL available in GWT Account.
         *  @param $savepath  String   Optional path to save CSV to (no trailing slash!).
         */
            public function DownloadCSV($site, $savepath=".")
            {
                if(self::IsLoggedIn() === true) {
                    $downloadUrls = self::GetDownloadUrls($site);
                    $filename = parse_url($site, PHP_URL_HOST) ."-". date("Ymd-His");
                    $tables = $this->_tables;
                    foreach($tables as $table) {
                        if($table=="CRAWL_ERRORS") {
                            self::DownloadCSV_CrawlErrors($site, $savepath);
                        }
                        elseif($table=="CONTENT_ERRORS") {
                            self::DownloadCSV_XTRA($site, $savepath,
                              "html-suggestions", "\)", "CONTENT_ERRORS", "content-problems-dl");
                        }
                        elseif($table=="CONTENT_KEYWORDS") {
                            self::DownloadCSV_XTRA($site, $savepath,
                              "keywords", "\)", "CONTENT_KEYWORDS", "content-words-dl");
                        }
                        elseif($table=="INTERNAL_LINKS") {
                            self::DownloadCSV_XTRA($site, $savepath,
                              "internal-links", "\)", "INTERNAL_LINKS", "internal-links-dl");
                        }
                        elseif($table=="EXTERNAL_LINKS") {
                            self::DownloadCSV_XTRA($site, $savepath,
                              "external-links-domain", "\)", "EXTERNAL_LINKS", "external-links-domain-dl");
                        }
                        elseif($table=="SOCIAL_ACTIVITY") {
                            self::DownloadCSV_XTRA($site, $savepath,
                              "social-activity", "x26", "SOCIAL_ACTIVITY", "social-activity-dl");
                        }
                        else {
                            $finalName = "$savepath/$table-$filename.csv";
                            $finalUrl = $downloadUrls[$table] ."&prop=ALL&db=%s&de=%s&more=true";
                            $finalUrl = sprintf($finalUrl, $this->_daterange[0], $this->_daterange[1]);
                            self::SaveData($finalUrl,$finalName);
                        }
                    }
                } else { return false; }
            }

        /**
         *  Downloads "unofficial" downloads based on the given URL.
         *
         *  @param $site    String   Site URL available in GWT Account.
         *  @param $savepath  String   Optional path to save CSV to (no trailing slash!).
         */
            public function DownloadCSV_XTRA($site, $savepath=".", $tokenUri, $tokenDelimiter, $filenamePrefix, $dlUri)
            {
                if(self::IsLoggedIn() === true) {
                    $uri = self::SERVICEURI . $tokenUri . "?hl=%s&siteUrl=%s";
                    $_uri = sprintf($uri, $this->_language, $site);
                    $token = self::GetToken($_uri, $tokenDelimiter);
                    $filename = parse_url($site, PHP_URL_HOST) ."-". date("Ymd-His");
                    $finalName = "$savepath/$filenamePrefix-$filename.csv";
                    $url = self::SERVICEURI . $dlUri . "?hl=%s&siteUrl=%s&security_token=%s&prop=ALL&db=%s&de=%s&more=true";
                    $_url = sprintf($url, $this->_language, $site, $token, $this->_daterange[0], $this->_daterange[1]);
                    self::SaveData($_url,$finalName);
                } else { return false; }
            }

        /**
         *  Downloads the Crawl Errors file based on the given URL.
         *
         *  @param $site    String   Site URL available in GWT Account.
         *  @param $savepath  String   Optional: Path to save CSV to (no trailing slash!).
         *  @param $separated Boolean  Optional: If true, the method saves separated CSV files
         *                             for each error type. Default: Merge errors in one file.
         */
            public function DownloadCSV_CrawlErrors($site, $savepath=".", $separated=false)
            {
                if(self::IsLoggedIn() === true) {
                    $type_param = "we";
                    $filename = parse_url($site, PHP_URL_HOST) ."-". date("Ymd-His");
                    if($separated) {
                        foreach($this->_errTablesSort as $sortid => $sortname) {
                            foreach($this->_errTablesType as $typeid => $typename) {
                                if($typeid == 1) {
                                    $type_param = "mx";
                                } else if($typeid == 2) {
                                    $type_param = "mc";
                                } else {
                                    $type_param = "we";
                                }
                                $uri = self::SERVICEURI."crawl-errors?hl=en&siteUrl=$site&tid=$type_param";
                                $token = self::GetToken($uri,"x26");
                                $finalName = "$savepath/CRAWL_ERRORS-$typename-$sortname-$filename.csv";
                                $url = self::SERVICEURI."crawl-errors-dl?hl=%s&siteUrl=%s&security_token=%s&type=%s&sort=%s";
                                $_url = sprintf($url, $this->_language, $site, $token, $typeid, $sortid);
                                self::SaveData($_url,$finalName);
                            }
                        }
                    }
                    else {
                        $uri = self::SERVICEURI."crawl-errors?hl=en&siteUrl=$site&tid=$type_param";
                        $token = self::GetToken($uri,"x26");
                        $finalName = "$savepath/CRAWL_ERRORS-$filename.csv";
                        $url = self::SERVICEURI."crawl-errors-dl?hl=%s&siteUrl=%s&security_token=%s&type=0";
                        $_url = sprintf($url, $this->_language, $site, $token);
                        self::SaveData($_url,$finalName);
                    }
                } else { return false; }
            }

        /**
         *  Saves data to a CSV file based on the given URL.
         *
         *  @param $finalUrl   String   CSV Download URI.
         *  @param $finalName  String   Filepointer to save location.
         */
            private function SaveData($finalUrl, $finalName)
            {
                $data = self::GetData($finalUrl);
                if(strlen($data) > 1 && file_put_contents($finalName, utf8_decode($data))) {
                    array_push($this->_downloaded, realpath($finalName));
                    return true;
                } else {
                    array_push($this->_skipped, $finalName);
                    return false;
                }
            }

        /**
         *  Regular Expression to find the Security Token for a download file.
         *
         *  @param $uri        String   A Webmaster Tools Desktop Service URI.
         *  @param $delimiter  String   Trailing delimiter for the regex.
         *  @return  String    Returns a security token.
         */
            private function GetToken($uri, $delimiter)
            {
                $matches = array();
                $tmp = self::GetData($uri);
                //preg_match_all("#x26security_token(.*?)$delimiter#si", $tmp, $matches);
                preg_match_all("#46security_token(.*?)$delimiter#si", $tmp, $matches); 
                //return substr($matches[1][0],4,-1);
                return substr($matches[1][0],3,-1);
            }

        /**
         *  Validates ISO 8601 date format.
         *
         *  @param $str      String   Valid ISO 8601 date string (eg. 2012-01-01).
         *  @return  Boolean   Returns true if string has valid format, else false.
         */
            private function IsISO8601($str)
            {
                $stamp = strtotime($str);
                return (is_numeric($stamp) && checkdate(date('m', $stamp),
                      date('d', $stamp), date('Y', $stamp))) ? true : false;
            }
     }
?>

File #2: ( index.php )

<?php
include 'gwtdata.php';
include 'credentials.php';
try {  
      $website = "http://www.yourdomain.com/"; /* Add Your Website Url */             
      $gdata = new GWTdata();
      if($gdata->LogIn($email, $password) === true) 
      {                             
      $gdata->DownloadCSV($website,"Here Add Your Folder Path To Save CSV File With GWT Data");                     
      echo "Datas Are Successfully Downloaded";
      }
    } catch (Exception $e) {
         die($e->getMessage());
      }
?>

Can anyone help me in this, to achieve all those datas and make it as excel file to generate using PHP.

John Peter
  • 2,870
  • 3
  • 27
  • 46
  • It would be nice to share your efforts. Why can you get the first 5 but not the last 3? Is there something blocking your script or do you get an error? Where is your code to get the first 5 so we can try to alter it to work for the latter – Hugo Delsing Mar 25 '13 at 10:07
  • @Hugo Delsing Here is the code sample, which i have used for this : http://php-webmaster-tools-downloads.googlecode.com/files/gwtdata.php – John Peter Mar 25 '13 at 10:41
  • Keep doing your smart works, untill you don't have / know the answer.Thanks for your great appreciation guys : "John Conde,Jocelyn,mvp,A.V,Bakudan". It was nice team effort while closing my question, instead of giving a better answer. – John Peter Mar 26 '13 at 08:52
  • As can be seen in [the revision history](http://stackoverflow.com/posts/15611372/revisions), your question was closed BEFORE you added any code sample. Now that you added some code to the question, it looks a bit more like a real question (according to the FAQ). So, you are the only one to blame here. – Jocelyn Mar 26 '13 at 23:49
  • @Jocelyn Nice move friend. – John Peter Mar 27 '13 at 07:25
  • GWT is google web toolkit, not webmaster tools... – Brady Moritz Sep 13 '14 at 23:58

2 Answers2

5

[..] I have searched some of the API Documents and Implements, [..]
[..] I have used two files in PHP for this GWT API, [..]

I am the author of the code that you quote (GWTdata PHP class) and first off want to make clear that this code is neither released by Google nor makes use of an official API, but is rather a custom script processing data from the web interface.

[..] returning few of the datas only from the GWT. [..]

A couple of weeks ago, there were some changes to the Google Webmaster Tools web interface (which, again, was/is used to process data requests). Thus, it broke some functionality of the PHP class GWTdata - such as downloading the crawl errors.

[..] Can anyone help me in this, to achieve all those datas and make it as excel file to generate using PHP. [..]

Unfortunately, for the most data there is nothing I/we can do about it (since the data is just not accessable any longer).

[..] Still I'm not getting the major parts / datas like,
1. Crawl errors [..]

Anyway, you can use this followup project to get the crawl errors.

GwtCrawlErrors (Download website crawl errors from Google Webmaster Tools as CSV):
https://github.com/eyecatchup/GWT_CrawlErrors-php

eyecatchUp
  • 10,032
  • 4
  • 55
  • 65
  • very useful lib, is there way use oauth instead of direct login details. – Gowri Jul 02 '14 at 09:41
  • @gowri yes, you can use oauth of course.. but you'd need to add it to the code yourself. i haven't found the time yet to implement. – eyecatchUp Jul 10 '14 at 08:25
  • @eyecatchUp how can you use OAuth? It seems that this uses Google SSO service, which is distinct from the API (where OAuth support is available). Or am I missing something? – jwadsack Jan 21 '15 at 21:51
  • @eyecatchUp can you help me with this: http://stackoverflow.com/questions/28835484/get-queries-by-landing-page-using-google-webmaster-tools-api ?? – LIGHT Mar 04 '15 at 05:37
  • Hi @eyecatchUp. Are you planning on updating the Login method to Oauth 2.0? Google is deprecating ClientLogin starting 20. April 2015. Please let us know. – Isidro Moran Mar 27 '15 at 20:35
  • @IsidroMoran Dammit.. I'll try to update it during the forthcoming weekend, but maybe it will be next week. Sorry, my fault. I thought I had one more month (till May 20th)?! – eyecatchUp Apr 15 '15 at 22:05
0

The Google API Client for PHP now supports the Webmasters API. Documentation is (as per usual) scarce for the PHP library, but it maps reasonably cleanly on to the methods described in the Webmasters API reference and there are some examples in the code so it's not too hard to get a hold on.

El Yobo
  • 14,823
  • 5
  • 60
  • 78
  • 1
    The official API allows you to manage verified sites and read crawl error statistics. It doesn't include top search queries, top pages, HTML improvements and other items the OP requested. – jwadsack Jan 21 '15 at 21:54
  • This answer is now the best since the Google API now supports Search Analytics queries. https://developers.google.com/webmaster-tools/v3/searchanalytics – Ciseur Jan 07 '16 at 09:46