There might be a few ways to optimize this.
In CQEngine I tend to use the term ordering to describe the desired outcome, and then there can be multiple ordering strategies to achieve the ordering.
The default behavior in CQEngine is to use what's known as the materialize ordering strategy, which involves copying all of the results matching a query into a temporary collection which is then sorted explicitly. This does not scale well to large ResultSets, so I'm not surprised that it would be slow for a ResultSet of 500K elements in this case. See OrderingStrategies for details.
However the TL;DR is: you could request CQEngine to use the index ordering strategy.
Index-accelerated ordering on attribute A
In your example you need to order results by (A, B), so you would request CQEngine to use an index on attribute A to accelerate the ordering. As you need to order by A and then B, this won't completely eliminate the need to sort results, because if the engine finds multiple objects which match the query colocated in a single bucket in index A, it will still have to explicitly sort a small number of matching objects in that bucket by attribute B in order to achieve the overall ordering by (A, B).
All other things being equal, based on a collection size of 10MM elements, and 2MM distinct values for A, you can expect each bucket in the index on A to contain 5 objects. So I'd expect that this approach could achieve a major reduction in time-to-first-result latency. It might not be perfect though...
Partial indexes
The approach above generally works well for data which resembles time-series, where A might be a timestamp, and you want to search recent items and your want to order results by recency too.
However a downside of using an index on attribute A to drive the search for workloads which don't resemble time-series, is that the ordered index on A which is being traversed to achieve the ordering of results, might be polluted with lots of objects which don't match the query you are trying to accelerate. So this can introduce filtering overhead.
You didn't mention much about your queries. For example do you know the set of queries in advance, or are they completely arbitrary?
I ask about this, because if you know which queries you expect in advance (or at least some fragments of the queries you expect in advance), then you could set up PartialIndexes on attribute A which are configured with filter queries you expect to find in your workload.
Partial indexes (plus enabling the index ordering strategy) can be a great way to mitigate index pollution, which means an index can be used to accelerate the ordering of large ResultSets, and it won't be susceptible to excessive filtering. Hope that helps!