3

http://testing:50070/webhdfs/v1/Test/asaw4zds_ssdf4_ht35-9a1a-4a7b-9n.jpg?op=OPEN

I am fetching the above image through hadoop using webhdfs i want to cache this image in browser how to do that is there any kind of mechanism to cache images coming from hadoop and how to hide the port number in this url

tina
  • 312
  • 1
  • 4
  • 18
  • 1
    You can set your web server to redirect `http://testing/webhdfs/v1/Test/asaw4zds_ssdf4_ht35-9a1a-4a7b-9n.jpg?op=OPEN` to `http://testing:50070/webhdfs/v1/Test/asaw4zds_ssdf4_ht35-9a1a-4a7b-9n.jpg?op=OPEN`. And usually the web server supports to set the expired time. – zsxwing Aug 31 '13 at 12:57
  • this data is dynamic that is the image name and testing changes for example it can be like given below for another image http://testingTwo.com/webhdfs/v1/Test/anotherImagetwo.jpg?op=OPEN and i have given img src so its called directly from hadoop not from server – tina Aug 31 '13 at 19:20
  • I suggest you not to expose the hadoop web interface directly. It uses a simple Jetty server which is not optimized. Use nutch or apache httpd and set reverse proxy to access hadoop web interface. – zsxwing Sep 01 '13 at 07:23
  • but we cannot access files from hadoop without webhdfs – tina Sep 03 '13 at 11:56

2 Answers2

2

I'm not familiar with webhdfs but if it does not support caching, you have to put a caching layer between client and webhdfs server.

Well, the thing you need is called reverse proxy with enabled caching capability. There is several options how to do it, but you should go with Apache mod_cache or Nginx reverse proxy caching and you will be just fine.

So, if you want to hide port from url you have to start webserver/proxy on port 80. Then just make proxy alias on /proxy context and set request redirect to http://testing:50070/webhdfs. Enable caching. And finally you can request your webhdfs via caching proxy on url http://testing/proxy/v1/Test/asaw4zds_ssdf4_ht35-9a1a-4a7b-9n.jpg?op=OPEN

The communication will look like:

Client 1:00PM <> Proxy (no cache) <> Webhdfs (asaw4zds_ssdf4_ht35-9a1a-4a7b-9n.jpg)
Client 2:00PM <> Proxy (asaw4zds_ssdf4_ht35-9a1a-4a7b-9n.jpg) expires in 1h
Client 2:45PM <> Proxy (asaw4zds_ssdf4_ht35-9a1a-4a7b-9n.jpg) expires in 15min
Client 4:00PM <> Proxy (asaw4zds_ssdf4_ht35-9a1a-4a7b-9n.jpg) expired!! <> Webhdfs (asaw4zds_ssdf4_ht35-9a1a-4a7b-9n.jpg)

I didn't provide any examples, but you can find many for Apache or Nginx. You choose.

Milan Baran
  • 4,133
  • 2
  • 32
  • 49
2

I know this is a late response but Apache Knox is the REST API gateway for Hadoop access. One of its specific goals is to hide the internal topology information from consumers. Apache Knox

lmccay
  • 396
  • 1
  • 9