7

I have an ORC file on my local machine and I need any reasonable format from it (e.g. CSV, JSON, YAML, ...).

How can I convert ORC to CSV?

Martin Thoma
  • 124,992
  • 159
  • 614
  • 958

2 Answers2

8
  1. Download
  2. Extract the files, go to the java folder and execute maven: mvn install
  3. Use ORC-Tools

This is how I use them - you will likely need to adjust the paths:

java -jar ~/.m2/repository/org/apache/orc/orc-tools/1.5.4/orc-tools-1.5.4-uber.jar data ~/your_file.orc > output.json

The output is JSON Lines which is easy to convert to CSV. First I needed to remove the last two lines from the output. Then:

import pandas as pd

df = pd.read_json('output.json', lines=True)
df.to_csv('output.csv')
Martin Thoma
  • 124,992
  • 159
  • 614
  • 958
0

Another option could be bigdata-file-viewer, it's a cross-platform application. You can open an ORC file and save the file in CSV format.

The detailed usage is as following:

  • Download runnable jar from release page or follow Build section to build from source code.
  • Invoke it by java -jar BigdataFileViewer-1.2-SNAPSHOT-jar-with-dependencies.jar
  • Open binary format file by "File" -> "Open". Currently, it can open file with parquet suffix, orc suffix and avro suffix. If no suffix specified, the tool will try to extract it as Parquet file
  • Set the maximum rows of each page by "View" -> Input maximum row number -> "Go"
  • Set visible properties by "View" -> "Add/Remove Properties"
  • Convert to CSV file by "File" -> "Save as" -> "CSV"
  • Check schema information by unfolding "Schema Information" panel
Eugene
  • 10,627
  • 5
  • 49
  • 67