There are a lot of discussions on the web on the topic of sorting huge files on Unix when the data will not fit into memory. Generally using mergesort and variants.
Hoewever, if suppose, there was enough memory to fit the entire data into it, what could be the most efficient / fastest way of sorting ? The csv files are ~ 50 GB (> 1 billion rows) and there is enough memory (5x the size of data) to hold the entire data.
I can use the Unix sort, but that still takes > 1 hr. I can use any language necessary, but what I am primarily looking for is speed. I understand we can load the data into say, a columnar type db table and sort, but it's a one-time effort, so looking for something more nimble ...
Thanks in advance.