serialize array in hadoop

Question

I want to serialize a stringarray "textData" and send it from mapper to reducer

  public void map(LongWritable key, Text value, OutputCollector< IntWritable,Text > 
                 output, Reporter reporter) throws IOException {

                  Path pt=new Path("E:\\spambase.txt");
                 FileSystem fs = FileSystem.get(new Configuration());
                BufferedReader textReader=new BufferedReader(new InputStreamReader(fs.open(pt)));


             int numberOfLines =  readLines( );
             String[ ] textData = new String[numberOfLines];
                 int i;
                 for (i=0; i < numberOfLines; i++) {
                 textData[ i ] = textReader.readLine();
                 }
                 textReader.close();

Please review your code formatting and say what already tried, did you search in older questions ? https://stackoverflow.com/questions/30945769/in-a-mapreduce-how-to-send-arraylist-as-value-from-mapper-to-reducer https://stackoverflow.com/questions/15810550/output-a-list-from-a-hadoop-map-reduce-job-using-custom-writable — rad, Dec 18 '17 at 19:52
yes .i serached on older question ,but i cant find the answer. — programer, Dec 19 '17 at 13:13
I understood the best way for sending an array from mapper to reducer is serializing and i dont know how to do it. — programer, Dec 19 '17 at 13:20
The upper code id my mapper and the "textdata" is my array.I want to seialze it and send it to the reducer. — programer, Dec 19 '17 at 13:23

OneCricketeer · Answer 1 · 2017-12-19T20:05:49.293

You seem to have some misunderstanding about how the MapReduce process works.

The mapper should ideally not read an entire file within itself.

A Job object generates a collection of InputSplits for a given input path.
By default, Hadoop reads one line of each split in the path (the input can be a directory), or just of the given file.
Each line is passed one at a time into Text value of your map class at the LongWritable key offset of the input.

Its not clear what you are trying to output, but you're looking for the ArrayWritable class and you serialize data to a reducer using output.collect(). However you need to modify your mapper output types from IntWritable, Text to use output.collect(some_key, new ArrayWritable(textData))

It's worth pointing out that you're using the deprecated mapred libraries, not the mapreduce ones. And that E:\\ is not an hdfs path, but a local filesystem.

serialize array in hadoop

1 Answers1