I have a project based on mrjob, with automated tests. One test runs mrjob locally against known input, and asserts the actual output matches expected output.
The issue is that the test passes in development environment, but fails in continous integration. The failure is due to the sort order of the output lines.
What can I do to make sure that the output is sorted consistently across environments (without sorting the files in bash manually)? I already sort the input files consistently.
I checked the following between dev and CI: OS versions are the same (well, almost: Ubuntu 14.04.3 vs 14.04.2), Python versions are the same (2.7.6), locale are the same (en_US.UTF-8).
FWIW I start the job programmatically like so:
mr_job = myMrjob(args=args)
with mr_job.make_runner() as runner, open(output_filename, 'w') as fout:
runner.run()
for line in runner.stream_output():
fout.write(line)