1

My objective is to apply map-reduce framework to cluster images using hadoop framework.For map-reduce i am using python programming and language and MRJOB package.But i am not able to create the logic of how to process the images. Like i have the images in .tif format.The questions i have is

  1. How to store the (format of storing)images in hdfs in order to retrive them for map-reduce in python.
  2. i am not getting the I/O pipeline for using python and hadoop
Alay Majmudar
  • 60
  • 1
  • 9
  • Why not use PySpark? Then Tensorflow can be used – OneCricketeer Sep 10 '18 at 14:22
  • But even for pyspark i would have to store and retrieve data from hdfs.How exactly the image would be stored is the question.Like in what format? – Alay Majmudar Sep 10 '18 at 18:59
  • Hadoop isn't a database. You can store raw JPG, TIF, PNG, whatever... if you archive lots of images as SequenceFile or Bzip2 might be better but only for compression – OneCricketeer Sep 10 '18 at 19:47
  • how can i achieve that using python?Like how do i access hdfs fillllles directly in python? – Alay Majmudar Sep 10 '18 at 20:11
  • Spark has API's to do this. https://spark.apache.org/docs/latest/api/python/pyspark.html#pyspark.SparkContext.binaryFiles Otherwise, http://wesmckinney.com/blog/python-hdfs-interfaces/ – OneCricketeer Sep 10 '18 at 21:43
  • Though, most people would prefer Scala or Java for fast Hadoop/Spark jobs. https://stackoverflow.com/questions/44890381/is-it-possible-to-read-pdf-audio-video-filesunstructured-data-using-apache-spa – OneCricketeer Sep 10 '18 at 21:47
  • I'm not sure why you essentially post the same question multiple times. As I've linked to, Spark can read files as raw binary. You don't "upload binary" to Hadoop (all files are already just binary data anyway, some just have extra metadata). You can read over this for ideas https://stackoverflow.com/questions/tagged/image-processing+hadoop?sort=votes – OneCricketeer Sep 11 '18 at 06:46
  • Thanks a lot for all your advice and suggestions. In another question I just clarified a bit more about what I have explored and done. In this thread I was asking a general exploration on the topic – Alay Majmudar Sep 11 '18 at 09:41

0 Answers0