8

I've been searching for npm packages but they all seem unmaintained and rely on the outdated user-agent databases. Is there a reliable and up-to-date package out there that helps me detect crawlers? (mostly from Google, Facebook,... for SEO) or if there's no packages, can I write it myself? (probably based on an up-to-date user-agent database)

To be clearer, I'm trying to make an isomorphic/universal React website and I want it to be indexed by search engines and its title/meta data can be fetched by Facebook, but I don't want to pre-render on all normal requests so that the server is not overloaded, so the solution I'm thinking of is only pre-render for requests from crawlers

Steve Bennett
  • 114,604
  • 39
  • 168
  • 219
KwiZ
  • 1,364
  • 2
  • 15
  • 25

3 Answers3

3

I found this isbot package that has the built-in isbot() function. It seams to me that the package is properly maintained and that they keep everything up-to-date.

USAGE:

const isBot = require('isbot');

...

isBot(req.get('user-agent'));

Package: https://www.npmjs.com/package/isbot

NeNaD
  • 18,172
  • 8
  • 47
  • 89
2

The best solution I've found is the useragent library, which allows you to do this:

var useragent = require('useragent');
// for an actual request use: useragent.parse(req.headers['user-agent']);
var agent = useragent.parse('Googlebot-News');

// will log true
console.log(agent.device.toJSON().family === 'Spider')

It is fast and kept up-to-date pretty well. Seems like the best approach. Run the above script in your browser: runkit

  • NOTE, this post did not age well. Last update to 'useragent' was in 2019 (https://github.com/3rd-Eden/useragent) – cevaris Aug 13 '23 at 02:57
1

I have nothing to add for your search for npm packages. But your question for an up to date user agent database to do build your own package, I would recommend ua.theafh.net

It has, in the moment, data up to Nov 2014 and as far as I know it is with more than 5.4 million agents also the largest search engine for user agents.

theafh
  • 478
  • 2
  • 7
  • it seems to be the db of all seen user agents, doesn't it? So using it how can I tell if an user agent string is from a crawler? – KwiZ Jan 08 '16 at 03:13
  • oh I see, there's a "class" column which classify if it's browser or bot. But do I have to compare equality of the whole string? – KwiZ Jan 08 '16 at 03:39
  • And btw, seems that they don't have APIs, so how can I get the list of bots' user agents? – KwiZ Jan 08 '16 at 04:07
  • There is no API but you could donwload the results as CSV via the icon in the right. You could also use wildcard search and "advanced settings" to filter for bots in general or mobil etc. For example: http://ua.theafh.net/list.php?s=%22%2A%22&include=yes&class=abt&do=desc – theafh Jan 08 '16 at 15:14
  • I know that we can download the csv with the "save" icon, but it only appears when I search some specific words. I thought of combining results of the vowels queries but it also limits the result to 1000 without pagination, so searching "u" doesn't give all the results – KwiZ Jan 09 '16 at 02:30
  • Well, it's a search engine and you also can't download Googles index... ;-) It's also not necessary to test your bot detection code: you only need a few thousand examples and not all bots in that engine. There is a lot of redundancy in the total of all bot agents, software revisions and so on... – theafh Jan 10 '16 at 14:20
  • But at least I can move to the next page to see all the results with Google :) I'm not sure the first 1000 results are enough to cover most of the significant bots today. If they aren't, I haven't found a proper way to apply this db for my use yet – KwiZ Jan 11 '16 at 09:51