I have a need to transfer data fairly regularly (on demand, not scripted / streamed) between two independent hadoop clusters. One of which is deployed in an isolated network and has no direct access to another.
I tried searching the official documentation and web for the answers, but it seems like that is a rather non-trivial task to accomplish. So the only answers I found relate to proxying REST service.
Is there a way to proxy distcp functionality in some way?
Maybe there is some other efficient (and scalable?) way to transfer data between two isolated hadoop clusters via some kind of temporary storage?