how to run scalding test in local mode with local input file

Question

Scalding has a great utility to run an integration test for the job flow. In this way the inputs and outputs are the in-memory buffer

val input = List("0" -> "This a a day")
val expectedOutput = List(("This", 1),("a", 2),("day", 1))
 JobTest(classOf[WordCountJob].getName)
  .arg("input", "input-data")
  .arg("output", "output-data")
  .source(TextLine("input-data"), input)
  .sink(Tsv("output-data")) {
  buffer: mutable.Buffer[(String, Int)] => {
    buffer should equal(expectedOutput)
  }
}.run

How can I transfare/write another code that will read input and write output to the real local file? Like FileTap/LFS in cascading - and not an in-memory approach

You can do it manually, using java.io.File, but I'm guessing that's not what you're looking for? `val input = io.Source.fromFile(INPUT_FILE).getLines() .flatMap((line: String) => List(line.hashCode -> line)).toList` — Dan Osipov, Apr 24 '15 at 17:51
Great, Thanks! But still if I have a folder and want to run on many files? Actually I'm looking something like LFSTap but in scalding... — Julias, Apr 29 '15 at 18:46
Dan, you're ever-helpful. you should put that comment in as answer so I can upvote it. — nont, Sep 03 '15 at 20:35

score 0 · Answer 1 · answered Aug 04 '22 at 18:41

You might check out HadoopPlatformJobTest and the TypedParquetTupleTest.scala example which uses a local mini-cluster.

This unit test writes to a "MiniLocalCLuster" - While it's not directly a file, but accessible via reading the local minicluster with Hadoop filesystem.

Given you local file scenario, maybe you can copies the files with local reads to the mini-HDFS.

how to run scalding test in local mode with local input file

1 Answers1