I would suggest to look at FileCrush (https://github.com/edwardcapriolo/filecrush), a tool to merge files on HDFS using MapReduce. It does exactly what you described and provides several options to deal with compressions and control the number of output files.
Crush --max-file-blocks XXX /data/input /data/output
max-file-blocks
represents the maximum number of dfs blocks per output file. For example, according to the documentation:
With the default value 8, 80 small files, each being 1/10th of a dfs
block will be grouped into to a single output file since 8 * 1/10 = 8
dfs blocks. If there are 81 small files, each being 1/10th of a dfs
block, two output files will be created. One output file contain the
combined contents of 41 files and the second will contain the combined
contents of the other 40. A directory of many small files will be
converted into fewer number of larger files where each output file is
roughly the same size.