0

I have a TypedTipe[(String, String, Long)] where the first String can assume only a limited (~10) number of values. I'd like to partition my output so that a folder is created for each type (I.E. 10 folders with the name of the first String). This is simple to achieve in Hive, however I cannot find an elegant way to do it in Scalding. The method def partition(p: T => Boolean): (TypedPipe[T], TypedPipe[T]) breaks the pipe in 2 parts but does not do what I'm looking for.

EDIT

  • I am using Scalding v0.13.1
  • I need to write a PackedAvroSource
Marsellus Wallace
  • 17,991
  • 25
  • 90
  • 154

1 Answers1

1

If you group by the field you want to partition by, you can then use PartitionedDelimitedSource to write the directory structure as needed. Ex:

val pipe: TypedPipe[(String, String, Long)] = ...
pipe
    .groupBy(_._1)
    .write(PartitionedDelimited[String, (String, String, Long)](args("output"), "%s"))
Dan Osipov
  • 1,429
  • 12
  • 15
  • My `Grouped[K, (String, String, Long)]` does not let me 'write'... Could this be a version issue? Also, is there a way to write a `PackedAvroSource` with the same technique? – Marsellus Wallace Oct 04 '16 at 21:05
  • I also can't find `PartitionedDelimited` in my classpath. There is `PartitionedDelimitedSource` but it takes more arguments. – Marsellus Wallace Oct 04 '16 at 21:22