2

Scalding has a great utility to run an integration test for the job flow. In this way the inputs and outputs are the in-memory buffer

val input = List("0" -> "This a a day")
val expectedOutput = List(("This", 1),("a", 2),("day", 1))
 JobTest(classOf[WordCountJob].getName)
  .arg("input", "input-data")
  .arg("output", "output-data")
  .source(TextLine("input-data"), input)
  .sink(Tsv("output-data")) {
  buffer: mutable.Buffer[(String, Int)] => {
    buffer should equal(expectedOutput)
  }
}.run

How can I transfare/write another code that will read input and write output to the real local file? Like FileTap/LFS in cascading - and not an in-memory approach

Julias
  • 5,752
  • 17
  • 59
  • 84
  • 1
    You can do it manually, using java.io.File, but I'm guessing that's not what you're looking for? `val input = io.Source.fromFile(INPUT_FILE).getLines() .flatMap((line: String) => List(line.hashCode -> line)).toList` – Dan Osipov Apr 24 '15 at 17:51
  • Great, Thanks! But still if I have a folder and want to run on many files? Actually I'm looking something like LFSTap but in scalding... – Julias Apr 29 '15 at 18:46
  • Dan, you're ever-helpful. you should put that comment in as answer so I can upvote it. – nont Sep 03 '15 at 20:35

1 Answers1

0

You might check out HadoopPlatformJobTest and the TypedParquetTupleTest.scala example which uses a local mini-cluster.

This unit test writes to a "MiniLocalCLuster" - While it's not directly a file, but accessible via reading the local minicluster with Hadoop filesystem.

Given you local file scenario, maybe you can copies the files with local reads to the mini-HDFS.

codeaperature
  • 1,089
  • 2
  • 10
  • 25