0

As quoted in the "Developer Guide" of Amazon EMR, the files in the input directory should be formatted as plain text. Does it mean that i cannot upload some binary files or .png files and parse them by python script?

kururu
  • 1
  • 3

1 Answers1

0

Likely not. See for example: https://groups.google.com/a/cloudera.org/forum/?fromgroups=#!topic/cdh-user/AUUZ0DKiJGw

what you can do is to have an input data be the file names themselves (either in S3 or HDFS). The Hadoop streaming script will get file names as input that it can open and process as it sees fit.

Joydeep Sen Sarma
  • 1,214
  • 9
  • 15
  • I have tried myself. It's OK to input binary file, but the binary file will be cut into several smaller files and loaded. – kururu May 02 '13 at 08:31