-1

Need to READ META DATA of files stored in Azure Data Lake Store.

File may be of format JPEG, EXCEL or TIFF

Please advise, really looking for suggestions. I am using Microsoft Azure Data Lake Store and using USQL.

Peter Bons
  • 26,826
  • 4
  • 50
  • 74
Sgupta
  • 23
  • 4
  • Your question is too vague for people to help. Please add more detail, an example of what you are trying to read, the source code you have written. You can edit your question by clicking the edit link. – tom May 03 '18 at 02:57
  • What type of meta data? The EXIF data inside JPEG? The creation date of the files? What are you going to do with it? Use it in a U-SQL script? In an application? – Michael Rys May 03 '18 at 03:04
  • @tom Hi Tom, I need to retrieve Meta data of JPEG, TIFF Images to be used for data Analytics. Can you please help here. – Sgupta May 03 '18 at 04:33
  • @MichaelRys I will be using for data analytics. Can you please let me know how to retrieve meta data. I really need help. – Sgupta May 03 '18 at 05:22

1 Answers1

1

At the moment that is not supported. It seems to be on the backlog according to the feedback site

You might be able to write a custom extractor as suggested in the link:

In case it is available, like EXIF in JPEG - extract some of the properties from the content using a custom extractor.

According to this blogpost they have done it for image property extraction, see the repo. It can be a guide on how to implement this for your scenario's. Here is an example query

@image_features =
    EXTRACT copyright string, 
            equipment_make string,
            equipment_model string,
            description string,
            thumbnail byte[], 
            name string, format string
    FROM @"/Samples/Data/Images/{name}.{format}"

    USING new Images.ImageFeatureExtractor(scaleWidth: 500, scaleHeight: 300);

@image_features = SELECT * FROM @image_features
                  WHERE format IN("JPEG", "jpeg", "jpg", "JPG");

OUTPUT @image_features
TO @"/output/images/image_features.csv"
USING Outputters.Csv();

Or have another process extract those properties and put them in some metadatafile in Azure Data Lake so you can join that file.

Peter Bons
  • 26,826
  • 4
  • 50
  • 74
  • Can you help me with code of custom extractor? I have tough time handling such but couldn't do that. – Sgupta May 03 '18 at 05:42
  • Sorry that is a bit out of scope, you can google for how to create custom extractors, they have some examples [here](https://github.com/Azure/usql/tree/master/Examples/DataFormats/Microsoft.Analytics.Samples.Formats) as well. – Peter Bons May 03 '18 at 05:42
  • So, For every format different extractor needs to be written within code? Is my understanding correct? – Sgupta May 03 '18 at 05:44
  • Would be really great, if you can share extractor for Tiff or JPEG format. will be really helpful. – Sgupta May 03 '18 at 06:23
  • But I did, see the links in my answer. – Peter Bons May 03 '18 at 06:39
  • Please if u can just copy paste answer, would be really helpful. – Sgupta May 03 '18 at 06:45
  • There is no copy / paste answer as there is some work on your part that needs to be done. Look, I gave you all the info you need in order to do it. There is just not much more for me to do other than sit next to you and do it together and obviously that is going to be difficult ;-) – Peter Bons May 03 '18 at 19:28
  • I agree, Thanks much Peter. – Sgupta May 03 '18 at 19:29