9

I'm parsing s3 logs to identify requests made from iMessage previews (on Mac OS Sierra & iOS 10).

There are few common types of UA strings, but I can't tell which are from the browser vs. imessage. I'm hoping it's a unique UA from Safari:

  1. `AppleCoreMedia/1.0.0.14B100 (iPhone; U; CPU OS 10_1_1 like Mac OS X; en_us)

^ Gotta be iOS Safari, right?

2.MobileSMS/1.0 CFNetwork/808.1.4 Darwin/16.1.0`

^ I think MobileSMS means imessage (hopefully)

  1. Mozilla/5.0 (iPhone; CPU iPhone OS 10_1_1 like Mac OS X) AppleWebKit/602.1.32 (KHTML, like Gecko) Mobile/14B100 Twitter for iPhone

^ Twitter via webkit webview?

  1. Mozilla/5.0 (iPhone; CPU iPhone OS 10_1_1 like Mac OS X) AppleWebKit/602.1.50 (KHTML, like Gecko) CriOS/54.0.2840.91 Mobile/14B100 Safari/602.1

^ more iOS Chrome (i see you CriOS)

  1. Mozilla/5.0 (iPhone; CPU iPhone OS 10_1_1 like Mac OS X) AppleWebKit/602.2.14 (KHTML, like Gecko) Version/10.0 Mobile/14B100 Safari/602.1

^ more iOS Chrome?

  1. Twitter/5002568 CFNetwork/760.6.3 Darwin/15.6.0 (x86_64)

^ Twitter

nealrs
  • 435
  • 1
  • 5
  • 19

2 Answers2

18

I've just faced the same issue while trying to re-route the iMessage crawler to a non-angular page which will generate the correct meta tags for it and found this question in the process. Figured I'd write an answer since I've now found it. Apple's documentation says nothing but mention the ...(Applebot/x.x) User-Agent which is not the correct one. I've found nothing on the internet so I logged traffic to just one file on a public server and shared the link via iMessage. In the log file I received:

Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_1) AppleWebKit/601.2.4 (KHTML, like Gecko) Version/9.0.1 Safari/601.2.4 facebookexternalhit/1.1 Facebot Twitterbot/1.0

When I've modified my RegEx to fit this User-Agent, the iMessage crawler was finally re-routed to the correct location.

So there it is, it's probably subject to change without notice since it does not exist in any official Apple documentation as far as I can tell but I hope this still helps someone :)

SimpleAnecdote
  • 765
  • 9
  • 17
  • This is correct. Curious as to why it's "impersonating" facebook/twitter in the UA string though. Perhaps they're using some open source library to parse the open-graph tags maybe? I'd like to be able to filter these requests but there's not much in here that's an artifact other than webkit in conjunction with fb/twitter. Oh well, good answer. – Benjamin Oman Jun 13 '17 at 21:06
  • @BenjaminOman Curious indeed. It'd definitely be helpful if we could use a `User-Agent` to differentiate between user agents. Alas, that's not the logic Apple employed here for whatever reasons which they keep to themselves. – SimpleAnecdote Jun 14 '17 at 08:55
  • Just in case anyone is wondering, as of December 2021 the UA string reads: `mozilla/5.0 (macintosh; intel mac os x 10_11_1) applewebkit/601.2.4 (khtml, like gecko) version/9.0.1 safari/601.2.4 facebookexternalhit/1.1 facebot twitterbot/1.0` – gillytech Dec 05 '21 at 01:58
  • 1
    @gillytech Is the difference in the case only? It seems the structure, strings, and versions are all the same to me, no? – SimpleAnecdote Dec 06 '21 at 00:41
  • 1
    @SimpleAnecdote Yeah it seems to be almost exactly the same. Just wanted to show that I checked it out and it's not changed. – gillytech Dec 07 '21 at 01:10
  • @SimpleAnecdote That's it. I guess I wasn't normalizing my strings... – gillytech Mar 20 '22 at 03:25
3

Use this regex to identify requests from IMsg crawlers.

(Twitterbot(.*)facebookexternalhit)|(facebookexternalhit(.*)Twitterbot)

I have tested it and it works perfectly.

Sam Soffes
  • 14,831
  • 9
  • 76
  • 80
Shwetabh Shekhar
  • 2,608
  • 1
  • 23
  • 36