-3

I am trying to sort 10 Million Account objects in an array or array list. The Account class implements the comparable interface. with some variables such as age, acct number, etc. I need to sort this array or array list by age, and I need to keep the relative ordering of the accounts with the same age.

I am thinking that I would use a Mergesort in this application, because 1) Mergesort is a stable comparable sort that will keep the relative ordering, and it has the best worst case time of n log n. However a Binary Tree Sort would have similar effects with the same time complexity with this amount of objects. What do you think?

  • It is better to use modified merge sort, that is already implemented in java. `accountList.sort(Comparator)` will be the most efficient technique where accountList is the list and Comparator is the resultant resultant comparator formed by different criterias. Internally list is converted to array for sorting – Manu Joy Jun 11 '15 at 05:48
  • This sounds like a job for the DB to be honest. Also, the complexity of an algorithm is not affected by the number of objects. – Evan Knowles Jun 11 '15 at 05:53
  • First of all you haven't specified How optimal of solution does the problem require, What are your main operations after sorting,is it 32 bit or 64 bit numbers ? As these question determines a lot about which data structure and algorithm must be followed. There is more to this problem than just sorting as dealing with million/billion numbers.So please provide those parameters to not make it as a open ended question. – Cyclotron3x3 Jun 11 '15 at 06:29

5 Answers5

2

If you really wanna sort by 'age', how about using Counting Sort (http://en.wikipedia.org/wiki/Counting_sort)? You can maintain same relative order as original in at most 2 iterations or 2n lookups.

tj-recess
  • 1,781
  • 11
  • 15
2

From the javadoc of Collections.sort():

This sort is guaranteed to be stable: equal elements will not be reordered as a result of the sort.

So don't reinvent the wheel, and just use the standard sort algorithm that the JDK provides: Collections.sort() or, better if using Java 8: List.sort(). Without any warmup that would allow the JIT to optimize the code, sorting 10M accounts with an age between 0 and 30 takes 1.4 seconds on my machine.

JB Nizet
  • 678,734
  • 91
  • 1,224
  • 1,255
1

I prefer using merge sort as it does not add space complexcity.

Quickost would also be considered providing space & memory allocation is not a constraint

Pavan Kumar K
  • 1,360
  • 9
  • 11
1

I think you can do it by serial step:

step 1: split 10 Million objects into 2^N slices, and sort for each slice;

step 2: use selectsort for the head objects from 2 slices and merge into new slice;

step 3: again and again do step 2, util just only 1 slice.

Peter Pan
  • 23,476
  • 4
  • 25
  • 43
0

It depends on the parameters like how optimal of solution does the problem require, what are your main operations after sorting, is it 32 bit or 64-bit numbers . i.e What are your project requirements.

Look at the difference between internal sorting and external sorting. Your approach requires external sorting mechanism.

For example, if they want to count the ages of the employees, you probably use the Counting sort, It can sort the data in the memory.

But for fairly random data, you need external sorting.

Cyclotron3x3
  • 2,188
  • 23
  • 40