1

I am working on a project where I have extracted images from sensor and saved them to the operating system directory. I have a Java API for uploading images to the server.

I need to upload these images and some other data typically float data type to the main server.

I need to decide an inter-mediator such as a database where I store those images and make connection through java to upload them or use HDFS.

Can some body please advise me, which option will be best for storing images? Database or HDFS?

Note: Images are up to 150 thousand can be more in future.

Mark Rotteveel
  • 100,966
  • 191
  • 140
  • 197
Ali R. Memon
  • 121
  • 1
  • 1
  • 12

3 Answers3

0

It totally depends on the usecase , you can choose

  1. HDFS : when you wanna read them as a whole or transfer or process them to do any manipulation upon the images data and store or do someother action based on the processed results. In simple, if you wanna do Map-Reduce operation. And reading images in HDFS is sequentially , if you wanna perform to fetch particular image based on certain selection criteria, then it costly and performance impacted operations.
  2. Database : It is better for query based operation where you wanna query or do DML operations upon images on certain criteria basis, In simple, WHERE conditions. But this is totally time consuming process, when you wanna process as a chunk. And the performance will be obviously very slow as you wanna store 150thousand of images

So My suggestion based on the requirement, you wanna store images as intermediate, it will be better to store in HDFS itself.

ideano1
  • 140
  • 1
  • 10
0

150.000 images is not considered a huge amount today. If an average of 10 MB is assumed for each image (uncompressed) the amount of data is 1.5 TB, which should be possible to store in an off-the-shelf database (with off-the-shelf hardware, i.e. a Linux box with some RAID disks) like postgreSQL. I'm no expert in HDFS even though I tried products in the same family as HDFS I find them easy to use, I guess you could try Hadoop then for processing of the images as well if you are looking for a way to parallelize the processing. Even though this product family is nice I would still use a standard database like postgreSQL if parallelisation is not really needed by nature (like you get in HDFS).

ASF
  • 64
  • 11
0

I think the best way to do that is to keep the floats you need and metadata of the images in the database. For easier searching and querying and easier interaction with the Java. The actual images are best stored on a file system to decrease the transformation from and to the database. I believe a simple file system would be good enough for that size of images. You probably won't use any of the fancy HDFS functions like map reduce and stuff like that. But that's up to you.

So in this case if a standard file system isn't good enough for you and you want something bigger then HDFS is the way to go. So the proper way would be a mixture of the two.

Veselin Davidov
  • 7,031
  • 1
  • 15
  • 23