I have a user access to hadoop server/cluster containing data that is stored solely in partitioned tables/files in hive (avro). I was wondering if I can perform mapreduce using python mrjob on these tables? So far I have been testing mrjob locally on text files stored on CDH5 and I am impressed by the ease of development.
After some research I discovered there is a library called HCatalog, but as far as I know it's not available for python (only Java). Unfortunately, I do not have much time to learn Java and I would like to stick to Python.
Do you know any way to run mrjob on hive stored data?
If this is impossible, is there a way to stream python-written mapreduce code to hive? (I would rather not upload mapreduce python files to hive)