0

With Hadoop 2.2 installed on single node I try to run Scalding tutorial, part 1, with command:

$ yarn jar target/scalding-tutorial-0.8.11.jar Tutorial0 --hdfs

https://github.com/Cascading/scalding-tutorial/

Before running tutorial I Have copied required file hello.txt to HDFS:

$ hdfs dfs -ls /data
Found 2 items
drwxr-xr-x   - hdfs hdfs          0 2014-02-04 16:35 /data/10gsort
-rw-r--r--   3 hdfs hdfs         26 2014-07-03 15:07 /data/hello.txt

It looks like tutorial can not find input file:

Exception in thread "main" com.twitter.scalding.InvalidSourceException:[TextLine(data/hello.txt)] Data is missing from one or more paths in: List(data/hello.txt)
at com.twitter.scalding.FileSource.validateTaps(FileSource.scala:102)
at com.twitter.scalding.Job$$anonfun$validateSources$1.apply(Job.scala:158)
at com.twitter.scalding.Job$$anonfun$validateSources$1.apply(Job.scala:153)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1156)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at com.twitter.scalding.Job.validateSources(Job.scala:153)
at com.twitter.scalding.Job.buildFlow(Job.scala:91)
at com.twitter.scalding.Job.run(Job.scala:126)
at com.twitter.scalding.Tool.start$1(Tool.scala:109)
at com.twitter.scalding.Tool.run(Tool.scala:125)
at com.twitter.scalding.Tool.run(Tool.scala:72)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at JobRunner$.main(JobRunner.scala:27)
at JobRunner.main(JobRunner.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)

Any ideas how to make it work?

DarqMoth
  • 603
  • 1
  • 13
  • 31
  • Looks like it's looking for `data/hello.txt` and not `/data/hello.txt`, may it be the case ? – merours Jul 03 '14 at 13:42
  • Yes, it looks to me the same also. But how can I fix this? Every tutorial program works both in a `local` and `hdfs` mode. When `--hdfs` arguement is given tutorial code ` hadoop.util.ToolRunner.run(new hadoop.conf.Configuration, new Tool, args);` should take care of converting the argument `data/hello.txt` to `/data/hello.txt` appropriately. But it does not. – DarqMoth Jul 03 '14 at 13:53
  • In fact, `data/hello.txt` and `/data/hello.txt` are not the same. Are you familiar with relative and absolute path ? – merours Jul 03 '14 at 13:58
  • Yes, I am. Scalding also comes with a ruby script `scald.rb` to run Scalding jobs. It allows switching from local development to hadoop mode using a single switch. That means converting arguments path from local to hdfs path, such as `data/hello.txt` to `/data/hello.txt`, I assume. But I don't see where tutorial runs this script ... – DarqMoth Jul 03 '14 at 14:11

1 Answers1

0

TextLine turns out to build a Hadoop Path according to the given path and configuration.

Hadoop Path API shows "A path string is absolute if it begins with a slash."

Tutorial I fixes the input to be "data/hello.txt", which actually ends up with a relative path. Current working directory will be prepended to form an absolute and solid path.

aihex
  • 61
  • 5