Andy's answer is fairly complete, but I want to add a few more insights on why returning pages not exactly the desired size may be useful - in current or future implementations:
One reason why Cassandra may want to return short pages is filtering. Imagine that the request has ALLOW FILTERING, and needs to read a lot of data from disk just to produce a few rows that end up passing the filter and being returned to the client. The client, not aware of this, has asked for a page of 1000 rows - but in our example maybe actually generating 1000 rows passing the filter would take 10 seconds, and the client would time out if Cassandra waits 10 seconds before producing any results. So in this case, Cassandra should just return whatever rows it managed to collect before timing out - even if these are just 17 rows and not 1000 rows. The client would receive these 17 rows, and resume to the next page normally.
In the extreme case, there may be so much filtering work with so little output, that we can have a long time with not even a single row output. In this case, before timing out Cassandra may return a page with zero results, which has the has_more bit on, meaning the client should continue paging (the number of results being less than requested - or even zero - is not the sign of when to stop paging!). I'm not sure that Cassandra actually returns zero-row pages today, but Scylla (a faster Cassandra clone) definitely does, and drivers should remember to use the has_more bit as the only sign of when to stop paging.
The other question is why would paging return more rows than desired. As
Andy said in his reply, I don't think this actually happens in Cassandra, nor in Scylla. But I can understand why some future implementation may want it to allow it to happen: Imagine that a coordinator needs 1000 rows for a page. So it reads up to 1000 rows from each replica, but there's inconsistent data, and one replica has an extra row, and the result is that the coordinator now has 1001 rows to return. It can (and today, does), return only the first 1000 rows but the downside is that now some of the replicas are in the wrong place in the data and will need to refind their place when asked to read the next page. Had we returned all 1001 rows we found, all of the replicas will be able to resume their reads efficiently from exactly where they left off.