0

I have a requirement to deploy a presto server which can help me query data stored in ADLS in Avro file formats. I have gone through this tutorial and it seems that the Hive is used as a catalogue/connector in presto to query from ADLS. Can I bypass Hive and have any connector to extract data from ADLS?

Egon Allison
  • 1,329
  • 1
  • 13
  • 22
Bhanuday Birla
  • 969
  • 1
  • 10
  • 23

1 Answers1

2

Can I bypass Hive and have any connector to extract data from ADLS?

No.

Hive here plays two roles here:

  • storage for metadata. It contains information like:
    • schema and table name
    • columns
    • data format
    • data location
  • execution
    • it is capable to read data from (HDFS) distributed file systems (like HDFS, S3, ADLS)
    • it tells how execution can be distributed.
kokosing
  • 5,251
  • 5
  • 37
  • 50
  • Thanks for this crucial information. If I have some data in ADLS which is not coming through Hive(meaning metastore for this data wont be there) then how can I query that data using Presto? – Bhanuday Birla Feb 28 '19 at 12:39
  • You need to create an external table with location that would point that data. – kokosing Feb 28 '19 at 12:41
  • So if i know the location of data, then I can use select * from hive.default.LOCATION without any schema, Right? – Bhanuday Birla Feb 28 '19 at 12:45