15

I noticed that iTunes preview allows you to crawl and scrape pages via the http:// protocol. However, many of the links are trying to be opened in iTunes rather than the browser. For example, when you go to the iBooks page, it immediately tries opening a url with an itms:// protocol.

Are there any other methods of crawling the App Store or is this the only way?

Can the itms:// protocol links themselves be crawled somehow?

Senseful
  • 86,719
  • 67
  • 308
  • 465

4 Answers4

23

I would have a decent look at the iTunes Search API and the iTunes Enterprise Partner API

You might get most/all of the information you need in a nice JSON file format.

If you can't get the information you need with the API, I would be interested what it is :)

philipp
  • 4,133
  • 1
  • 36
  • 35
5

As phillipp mentioned, the iTunes search API is an easy way to retrieve data about your App Store listings in JSON format.

Simply query for this with your app id (you can find the app id by viewing the web listing for your app at itunes.apple.com), ex:

http://itunes.apple.com/lookup?id=INSERT_YOUR_APP_ID_HERE

then, parse the resulting JSON to your heart's content.

DiscDev
  • 38,652
  • 20
  • 117
  • 133
4

The only difference between http:// links and itms:// links is that you need to set your User-Agent to an iTunes user-agent, and depending on the version you may also have to include a verification code based on some not-so-secret algorithm.

For example this is the code for iTunes 9:

# Some magic. Generates a seed we use for X-Apple-Validation. Adapted from LWP::UserAgent::iTMS_Client.
function comp_seed($url, $user_agent) {
    $random  = sprintf( "%04X%04X", rand(0,0x10000), rand(0,0x10000) );
    $static  = base64_decode("ROkjAaKid4EUF5kGtTNn3Q==");
    $url_end = ( preg_match("|.*/.*/.*(/.+)$|",$url,$matches)) ? $matches[1] : '?';
    $digest  = md5(join("",array($url_end, $user_agent, $static, $random)) );
    return $random . '-' . strtoupper($digest);
}

However if you are only scraping, iTunes preview should work for your purposes, the link you gave us to the iBooks page had more than enough information to scrape.

Adam M-W
  • 3,509
  • 9
  • 49
  • 69
1

We tried scraping ourselves too about a year ago and it just became too much of a headache. Philipp's comment is a good one as the enterprise feed from apple (need to apply for it with a legitimate use) does have a good amount of useful info that you might be after in scraping.

There are a few companies that offer data as a service too - abto and AppMonsta are two I heard of when I was looking. I can't seem to find abto anymore but http://appmonsta.com seems to be. The search API looks ok (never experimented) but limited.

Good luck!