31

For research purposes, I'd like to list all the packages that are available on npm. How can I do this?

Some old docs at https://github.com/npm/registry/blob/master/docs/REGISTRY-API.md#get-all mention an /-/all endpoint that presumably once worked, but http://registry.npmjs.org/-/all now just returns {"message":"deprecated"}.

Mark Amery
  • 143,130
  • 81
  • 406
  • 459

1 Answers1

40

http://blog.npmjs.org/post/157615772423/deprecating-the-all-registry-endpoint describes the deprecation of the http://registry.npmjs.org/-/all endpoint, and links to the tutorial at https://github.com/npm/registry/blob/master/docs/follower.md as an alternative approach. That tutorial describes how to set up a "follower" that receives all changes made to the NPM registry. That's... a bit odd, honestly. Clearly such a follower is not an adequate substitute for getting a list of all packages if you want to do data analysis on the entire NPM ecosystem.

However, within that codebase we learn that at the heart of the NPM registry is a CouchDB database located at https://replicate.npmjs.com. The _all_docs endpoint is not disabled, so we can hit it at https://replicate.npmjs.com/_all_docs to get back a JSON object whose rows property contains a list of all public packages on NPM. Each package looks like:

{"id":"lodash","key":"lodash","value":{"rev":"634-9273a19c245f088da22a9e4acbabc213"}},

At the point that I write this, there are 618660 rows in that response and it comes to around 64MB.

If you want more data about a particular package, you can look up a particular package using its key - e.g. hit https://replicate.npmjs.com/lodash to get a huge document containing stuff like Lodash's description and release history.

If you want all the current data about all packages, you could use the include_docs parameter to _all_docs to include the actual document bodies in the response - i.e. hit https://replicate.npmjs.com/_all_docs?include_docs=true. Be ready for a lot of data.

If you need yet more data, like download counts, that is not included in these CouchDB documents, then it is worth perusing the docs at https://github.com/npm/registry/tree/master/docs which detail some other available APIs - with the caveat, noted in the question, that not everything documented there actually works.

Mark Amery
  • 143,130
  • 81
  • 406
  • 459
  • 3
    Note that you can also [paginate](http://docs.couchdb.org/en/2.1.1/ddocs/views/pagination.html#paging) the `_all_docs` endpoint. – abraham Jul 04 '18 at 16:45
  • 2
    For anyone considering using the endpoint with `include_docs=true`, as of today (2020-05-29) the full download is just under 35GB. – tao_oat May 29 '20 at 15:26
  • 1
    As of (2020-06-30) Total rows in `_all_docs` is `1322754` !! – ezio4df Jun 30 '20 at 13:46
  • As of 2022, I fear to even dowload it. Oh well, will edit with current size soon (I hope) :D – TDiblik Aug 19 '22 at 12:24
  • 1
    As of March 2023, total rows is 2,347,679 and the total filesixe is ~250MBytes (without include_docs=true) – ColinE Mar 17 '23 at 13:26
  • @ColinE How long did it take to download the whole file? I failed to establish a stable enough connection. – xiaoyu2006 Aug 19 '23 at 13:47