3

Are there any simple, easily launched java tools for visualizing key/value data from map reduce job directories ?

Specifically i want to browse a 20-job map reduce workflow , cliicking on individual files and looking at the data, and maybe even see a histogram of file sizes.

  • There are some caveats here, for example -

    • Some files have serialized data (not just text)
    • Obviously, this system would probably be silly to use at "cloud scale" , rather its a dev tool.

Nevertheless, such a tool would be useful developing and locally debugging large, connected m/r pipelines.

This is for development purposes (im not trying to visualize the distributed key/value hadoop data in a real cluster).

jayunit100
  • 17,388
  • 22
  • 92
  • 167

1 Answers1

1

Checkout the KarmaSphere Studio

Monitor Job Execution Step-by-Step
- Workflow with Results: Shows the resulting output at each step of the MapReduce job.
- Hadoop Logs from the Desktop: Accesses Hadoop logs easily from the desktop.
- Job Failure Options: Allows the specification of job failure options such as automatic invocation of a specified script upon job failure for EMR.

Praveen Sripati
  • 32,799
  • 16
  • 80
  • 117
  • Thanks - looks like a heavy weight utility... Any specifics on how could I use karma sphere to solve my specific issues? – jayunit100 Jan 08 '12 at 16:14
  • sorry, but I realized that karmasphere doesn't do what I needed , although it seems to. It is only capable of monitoring and browsing job data when you execute the job in k-sphere. – jayunit100 Jan 11 '12 at 19:23
  • I want a tool that is more stateless and modular, I think kaashpere is restrictive. I want a lightweight jar file that is modular and extensible - I have thrift data that is binary serialized, etc.... Seems like karma showers paradigm Is better for starting a new code base than debugging an existing one . – jayunit100 Jan 13 '12 at 18:53