I reviewed here,
How to make hive run mapreduce jobs concurrently?
My question is how to set this "hive.exec.parallel.thread.number" option in an Amazon EMR cluster on startup?
Also, is setting this option equivalent to doing something like the following?
cat hive_script.hql | parallel --gnu hive -e '{}'
My hive script can run in any order, as it is simply launching a bunch of jobs for each new (time based) partition of an existing table to create time based partitions of a derivative table.
If they are not equivalent in my case, will one of these strategies yield better performance?