13

Sample user agent String:

Mozilla/5.0 (Linux; Android 8.0; Pixel 2 Build/OPD3.170816.012) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.1511.1269 Mobile Safari/537.36; Bytespider

Mozilla/5.0 (iPhone; CPU iPhone OS 11_0 like Mac OS X) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.7997.1233 Mobile Safari/537.36; Bytespider

Community
  • 1
  • 1
Gokula Kannan
  • 209
  • 1
  • 3
  • 7
  • I have begun seeing this too: `Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.8844.1316 Mobile Safari/537.36; Bytespider` – kjones Sep 13 '19 at 22:01
  • 4
    I suspect the 7 upvotes here means "me too" rather than "this is a good question". In general, this is rather brief for a good Stack Overflow question. – halfer Oct 09 '19 at 15:57
  • 1
    https://www.cnbc.com/2019/08/12/tiktok-owner-bytedance-launches-new-search-engine-in-china.html They've launched a new search engine - and these visits are likely their spider. – GWR Jan 22 '20 at 11:29

2 Answers2

12

We were seeing the same things - a reasonably small set of Android/iOS user agents, all ending with Bytespider, and all ignoring our robots.txt files. One of our platform engineers had the bright idea of a reverse DNS lookup on their cluster.

The result - this appears to be https://bytedance.com/

Given they don't respect the robots.txt file, I'd consider them block-fodder.

James
  • 391
  • 1
  • 4
4

I'm seeing this on my website as well. Every second it issues GET requests for nonexistent pages. I resorted to returning 403 HTTP status code when bytespider is in the user agent string and blocking IP addresses in the firewall (adding them periodically based on server logs). The majority of the requests are issued from IP addresses owned by Chinese and Singaporean ISPs as well as Cloudflare.

Sample requests:

172.69.22.98 - - [30/Sep/2019:13:16:10 +0000] "GET /CloudHD/interview-of-riyaz-14-bestfriend-secret-reveals-with-proof-yaari-hai/ZVRmSmlTQlFaRDQ.html HTTP/1.1" 403 571 "-" "Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.5653.1247 Mobile Safari/537.36; Bytespider"
172.68.142.101 - - [30/Sep/2019:13:18:12 +0000] "GET /CloudHD/hot-desi-girl-big-boob-s-in-blouse-nude-selfie/WmVzSi1SOEtXTjg.html HTTP/1.1" 403 571 "-" "Mozilla/5.0 (iPhone; CPU iPhone OS 11_0 like Mac OS X) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.8372.1186 Mobile Safari/537.36; Bytespider"

As you may guess, no paths even remotely resembling these are available on my website. The bot has never even tried to read /robots.txt, so there's no point in blocking it with this method.

Semrush bot behaved almost identically until I blocked it with /robots.txt. So Bytespider may be what it presents itself as when it is blocked and doesn't want to get bad press.

Jakub Alba
  • 103
  • 5