1

How does one specify the TotalOrderPartitioner when using mrjob? Is this the default, or must it be specified explicitly? I've seen inconsistent behavior on different data sets.

vy32
  • 28,461
  • 37
  • 122
  • 246

1 Answers1

1

You can specify it with job.setPartitionerClass(TotalOrderPartitioner.class);

It is not the default partitioner class. The default is the HashPartitioner class.

It's not a very easy partitioning system to use. You must use an InputSampler to pre-sample data from your input when using the TotalOrderPartitioner.

I wrote a very detailed tutorial with examples and illustrations (from beginner to advanced usage) on how to use these here.

Nicomak
  • 2,319
  • 1
  • 21
  • 23
  • Thanks for the reference to the tutorial. It's very good. You are right, it is complex. – vy32 May 27 '16 at 19:03