From https://modin.readthedocs.io/en/latest/
Modin uses Ray or Dask to provide an effortless way to speed up your
pandas notebooks, scripts, and libraries. Unlike other distributed
DataFrame libraries, Modin provides seamless integration and
compatibility with existing pandas code. Even using the DataFrame
constructor is identical.
Two main features that stand out are:
- Using multiple cores of CPU with same pandas API:
In pandas, you are only able to use one core at a time when you are
doing computation of any kind. With Modin, you are able to use all of
the CPU cores on your machine.
- Support for very big datasets
With Modin, because of its light-weight, robust, and scalable nature,
you get a fast DataFrame at 1MB and 1TB+
Specifically for the slow group_by part of the question, there is a github discussion that points out that regular old pandas works better than modin.pandas:
https://github.com/modin-project/modin/issues/895
Modin is still under active development, the README.md from their github repo(https://github.com/modin-project/modin) tabulates panda API coverage mentioning these functions:
