Loading extremely long lines with TextLine in Cascading

Question

I'm using TextLine in Cascading to load files with very large lines in Cascading. The lines are very long - around 30Mb on average, some much longer. When I run the job locally to test it it runs fine, but when I run it on the cluster it fails after a period of intensive crunching. It gives errors like:

cascading.tuple.TupleException: unable to read from input identifier: maprfs:/xxx/xxx/xxx/part-00001
at cascading.tuple.TupleEntrySchemeIterator.hasNext(TupleEntrySchemeIterator.java:127)
at cascading.flow.stream.SourceStage.map(SourceStage.java:76)
at cascading.flow.stream.SourceStage.run(SourceStage.java:58)
at cascading.flow.hadoop.FlowMapper.run(FlowMapper.java:127)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:443)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:353)
at org.apache.hadoop.mapred.Child$4.run(Child.java:282)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1122)
at org.apache.hadoop.mapred.Child.main(Child.java:271)

It also sometimes complains about stale file handles. The file it's trying to read is definitely there. Can somebody help me, please?

Are you sure that is the complete stacktrace? Is it related to https://groups.google.com/forum/#!topic/cascading-user/TlKjFdnOa84 ? — Alfonso Nishikawa, Aug 14 '14 at 19:16
Here is a stacktrace from one of the map jobs: http://pastebin.com/9JCbsmcr . I don't see how your link is related to my problem. My problem is with reading very long lines from a text file using TextLine, I don't use sequence files. — Savage Reader, Aug 15 '14 at 10:52
You are right, not about sequence files. Anyway, the stacktrace gives some information: it seems it is related with MapR :( I found the offending line to be `if (this.curPos_ + length > this.inode_.eof()) {` but who knows (if I am right) why `inode_` is null :( — Alfonso Nishikawa, Aug 16 '14 at 16:19
I've opened a case with MapR support, I hope it gets solved soon. — Savage Reader, Aug 18 '14 at 13:10

Loading extremely long lines with TextLine in Cascading

0 Answers0