6

What are file formats supported by Presto? Is there any specific file formats recommended for better performance. I would be interested to know if there is any columnar file format like RCfile that's optimized for Presto?

Animesh Raj Jha
  • 2,704
  • 1
  • 21
  • 25

5 Answers5

8

We test every Trino (formerly PrestoSQL) release with Parquet, ORC, RCFile, Avro, SequenceFile, TextFile, and other formats, but Presto should support any standard Hadoop file format. At Facebook most of our data is in ORC format, so currently this format has the best performance on Presto.

Dain Sundstrom
  • 2,699
  • 15
  • 14
2

Best optimized is ORC. Parquet is pretty good too, more optimizations coming thanks to Netflix.

1

For the current version of presto, I recommond using ORC file, Dain has finished the new ORC reader in presto, and it is very fast. Here is the blog https://code.facebook.com/posts/370832626374903/even-faster-data-at-the-speed-of-presto-orc/

袁安峰
  • 652
  • 6
  • 5
1

At present Text, SequenceFile, RCFile, ORC and Parquet file formats are supported by Presto. Reference: https://prestodb.io/overview.html

venus
  • 1,188
  • 9
  • 18
0

The following file types are supported for the Hive connector: •ORC •Parquet •Avro •RCFile •SequenceFile •JSON •Text In my practice the best optimized formats are ORC and Parquet.