2

How can I create a Scalding Source that will handle conversions between avro and parquet.

The solution should:

 1. Read from parquet format and convert to avro memory representation
 2. Write avro objects into a parquet file

Note: I noticed Cascading has a module for leveraging thrift and parquet. It occurs to me that this would be a good place to start looking. I also opened a thread on google-groups/scalding-dev

beefyhalo
  • 1,691
  • 2
  • 21
  • 33

1 Answers1

4

Try our latest changes in this fork - https://github.com/epishkin/scalding/tree/parquet_avro/scalding-parquet

Oleksii
  • 1,101
  • 7
  • 12
  • This is exactly what I was looking for. The projection functionality is the reason why I was looking to parquet. I'm going to use the Typed API with these sources to really have a beautiful app. Thanks so much! – beefyhalo Sep 15 '14 at 13:57
  • We have been also working on adding the support for Predicates. I'll share that code soon. But it uses parquet-1.6.0rc2 - not a released version of parquet – Oleksii Sep 15 '14 at 14:34
  • That would be absolutely fantastic :) – beefyhalo Sep 16 '14 at 03:37
  • 1
    Hey Oleksii, when do you plan to merge this into master? – user3335040 Dec 05 '14 at 05:06
  • 1
    @Oleksii Do you have any plans to have a Merge request to the scalding project? – Taky May 25 '15 at 13:22