Your mergesort-like approach seems very reasonable. More generally, this type of sorting algorithm is called an external sorting algorithm. These algorithms often work as you've described - load some subset of the data into memory, sort it, then write it back out to disk. At the end, use a merging algorithm to merge everything back together. The choice of how much to load in and what sorting algorithm to use are usually the dominant concerns. I'll focus mostly on the sorting algorithm choice.
Your concerns about quicksort's worst-case behavior are generally speaking nothing to worry about, since if you choose the pivot randomly the probability that you get a really bad runtime is low. The random pivot strategy also works well even if the data is already sorted, as it has no worst-case inputs (unless someone knows your random number generator and the seed). You could also use a quicksort variant like introsort, which doesn't have the worst-case behavior, as your sorting algorithm in order to avoid this worst-case.
That said, since you know that the data is already partially sorted, you may want to look into an adaptive sorting algorithm for your sorting step. You've mentioned insertion sort for this, but there are much better adaptive algorithms out there. If memory is scarce (as you've described), you might want to try looking into the smoothsort algorithm, which has best-case runtime O(n), worst-case runtime O(n log n), and uses only O(1) memory. It's not as adaptive as some other algorithms (like Python's timsort, natural mergesort, or Cartesian tree sort), but it has lower memory usage. It's also not as fast as a good quicksort, but if the data really is mostly sorted it can do pretty well.
Hope this helps!