0

I want to serialize a stringarray "textData" and send it from mapper to reducer

  public void map(LongWritable key, Text value, OutputCollector< IntWritable,Text > 
                 output, Reporter reporter) throws IOException {

                  Path pt=new Path("E:\\spambase.txt");
                 FileSystem fs = FileSystem.get(new Configuration());
                BufferedReader textReader=new BufferedReader(new InputStreamReader(fs.open(pt)));


             int numberOfLines =  readLines( );
             String[ ] textData = new String[numberOfLines];
                 int i;
                 for (i=0; i < numberOfLines; i++) {
                 textData[ i ] = textReader.readLine();
                 }
                 textReader.close();
OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
programer
  • 9
  • 7
  • Please review your code formatting and say what already tried, did you search in older questions ? https://stackoverflow.com/questions/30945769/in-a-mapreduce-how-to-send-arraylist-as-value-from-mapper-to-reducer https://stackoverflow.com/questions/15810550/output-a-list-from-a-hadoop-map-reduce-job-using-custom-writable – rad Dec 18 '17 at 19:52
  • yes .i serached on older question ,but i cant find the answer. – programer Dec 19 '17 at 13:13
  • I understood the best way for sending an array from mapper to reducer is serializing and i dont know how to do it. – programer Dec 19 '17 at 13:20
  • can anyone help me? – programer Dec 19 '17 at 13:20
  • The upper code id my mapper and the "textdata" is my array.I want to seialze it and send it to the reducer. – programer Dec 19 '17 at 13:23

1 Answers1

0

You seem to have some misunderstanding about how the MapReduce process works.

The mapper should ideally not read an entire file within itself.

A Job object generates a collection of InputSplits for a given input path.
By default, Hadoop reads one line of each split in the path (the input can be a directory), or just of the given file.
Each line is passed one at a time into Text value of your map class at the LongWritable key offset of the input.

Its not clear what you are trying to output, but you're looking for the ArrayWritable class and you serialize data to a reducer using output.collect(). However you need to modify your mapper output types from IntWritable, Text to use output.collect(some_key, new ArrayWritable(textData))

It's worth pointing out that you're using the deprecated mapred libraries, not the mapreduce ones. And that E:\\ is not an hdfs path, but a local filesystem.

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245