If you can do it in Pig (or Hive), do it in Pig (or Hive).
Otherwise, do it in Java MapReduce.
Benefits of Pig:
Structured data like CSV is REALLY easy to load and use
Not that much slower than Java
Not prone to Java-level bugs
Easier to read and write
No need to compile: easier to maintain, easier to deploy
There are a few things you may think you can't do in Pig at first and want to use Java for, but you can do it in Pig once you know more about it:
You can write user-defined loaders in Java. You are going to write some Java to parse out that complicated data format anyways, so why not do it in a Pig Loader?
Nesting map and bag datatypes can model hierarchical data structures pretty well, but you'll probably have to write a ton of UDFs.
You can use Java MapReduce in Pig. This allows you to do the hard operation in Pig, but the easier stuff elsewhere.
There are a few here, but you get the point. Pig is very customizable, and you'll end up writing less Java in general.
Basic stuff is easy. We can do things like hierarchical data structures, and custom loading with a bit of effort. Ok, so what's left?
Exotic uses of partitioners to do something MapReduce isn't intended for.
Really nasty data formats or completely unstructured data (video, audio, raw human-readable text)
Doing complex operations in the DistributedCache (basic things can be done with JOIN and USING 'replicated')
Hopefully others can add things they couldn't do in Pig in the comments.