I have a web application and 1 remote clusters(It can be one or more). These cluster can be on different machines. I want to perform following operations from my web application:
1 HDFS Actions :-
- Create New Directory
- Remove files from HDFS(Hadoop Distributed File System)
- List Files present on HDFS
- Load File onto the HDFS
- Unload File
2 Job Related Actions:-
- Submit Map Reduce Jobs
- View their status i.e. how much job has completed
- Time taken by the job to finish
I need a tool that can help me do these tasks from the web application - via an API, via REST calls etc. I'm assuming that the tool will be running on the same machine( as the web application) and can point to a particular, remote cluster.
Though as a last option(as there can be multiple,disparate clusters, it would be difficult to ensure that each of them has the plug-in,library etc. installed), I'm wondering if there would be some Hadoop library,plug-in that rests on the cluster,allows access from remote machines and performs the mentioned tasks.