22

Is there a way to see the contents of an orc file that hive 0.11 and above use. I usually cat gz files and decompress them to see the contents eg: cat part-0000.gz | pigz -d | more Note: pigz is a parallel gz program.

I would like to know if there is something similar to this for orc files.

viper
  • 2,220
  • 5
  • 27
  • 33

3 Answers3

29

There is now also a native executable for Linux and MacOS that prints the contents of the orc file in JSON. See the ORC project (http://orc.apache.org/) and build the C++ tools.

% orc-contents examples/TestOrcFile.test1.orc

There is also a native metadata tool:

% orc-metadata ../examples/TestOrcFile.test1.orc

The ORC project also has a standalone uber jar that can do the same from Java.

% java -jar orc-tools-1.2.3-uber.jar data myfile.orc
Owen O'Malley
  • 584
  • 5
  • 7
27

Updated answer in year 2020:

Per @Owen's answer, ORC has grown up and matured as it's own Apache project. A completed list of ORC Adopters shows how prevalent it is now supported across many varieties of Big Data technologies.

Credit to @Owen and the ORC Apache project team, ORC's project site has a fully maintained up-to-date documentation on using either the Java or C++ stand alone tool on ORC file stored on a Linux local file system. Which carried on the torch for the original Hive+ORC Apache wiki page.

Original answer dated: May 30 '14 at 16:27

The ORC file dump utility comes with hive (0.11 or higher):

hive --orcfiledump <hdfs-location-of-orc-file>

Source link

geekyj
  • 402
  • 6
  • 10
  • 2
    Unfortunately the "-d" argument which actually outputs the data (as opposed to just the metadata) is only available from Hive 0.15. – Mass Dosage Apr 05 '16 at 09:05
  • 1
    FWIW, the original Hive+ORC wiki page now contains a table that identified new features per Hive version that was introduced. https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ORC#LanguageManualORC-ORCFileDumpUtility – geekyj Apr 13 '20 at 23:49
2

It's also capable to see the contents of a ORC file by desktop application running on Linux.

There is a desktop application to view Parquet and also other binary format data like ORC and AVRO. It's pure Java application so that can be run at Linux, Mac and also Windows. Please check Bigdata File Viewer for details.

It supports complex data type like array, map, struct, etc.

enter image description here

Eugene
  • 10,627
  • 5
  • 49
  • 67