Hadoop map-reduce mapper programming

Question

import java.io.IOException;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reporter;


public class ADDMapper extends MapReduceBase implements Mapper<LongWritable,
                              Text,Text,LongWritable>
{   @Override
public void map(LongWritable key, Text value,OutputCollector<Text,    LongWritable> output, Reporter r)throws IOException 
    {
    String s=value.toString();
         char[] words=s.toCharArray();
                    int wno=0;
                    int ino=0;
        for(int i=0;i<words.length;i++)
          {    

           String temp="";  
               for(int j=ino;j<words.length;j++)
                   {                        

                        if(words[j]!=' ')
                        {   temp+=words[j];
                        }
                        else
                        {
                            wno=j;
                        if(temp!="")
                        {     

                            ino=ino + key; //////POINT OF ERROR

        output.collect(new Text(temp),new LongWritable(ino));
                        }

                    temp="";

                        ino=wno+1;
                        break;
                        }

                  }
        } 
}

}

I want to get the index value of every string, sorted by string.
The above code is neither giving the index value nor shuffling the strings. let input file: hi how are you hi i am right. how is your job. hi are you ok.

output: am 50 are 7,33 hi 0,30,44 how 3,14 . .

Could you (a) format your code properly, and (b) NOT ASK QUESTIONS IN CAPS please? Also, read [how do I ask a good question](http://stackoverflow.com/help/how-to-ask) for some further tips. Your question, as it is, will receive few answers. — Wai Ha Lee, Apr 05 '15 at 09:07

score 1 · Answer 1 · answered Apr 05 '15 at 13:20

1

Hi Shivendra I wrote the below logic of mapper that will help you to find the index of each string with sorted output. Output of this code is sorted String with its index, then you can run reducer on this output.

String str=value.toString();
String[] tokens = str.split(" "); //split into words
//create hashmap for unique word
Map<String,Integer> uniqueString = new HashMap<String,Integer>();
for(int i=0;i<tokens.length;i++){
    uniqueString.put(tokens[i],1);
}       
//for sorting create TreeMap from above hash map
Map<String,Integer> map = new TreeMap<String,Integer>(uniqueString); 
 for (Map.Entry entry : map.entrySet()) {
    int index=0;
//find the index of the word
    index = str.indexOf((String)entry.getKey());
    while (index >= 0) {
            output.collect(new Text((String)entry.getKey()),new LongWritable(index));
            index = str.indexOf((String)entry.getKey(), index + 1);
    }
}

output of this logic: am:20, are:7, are:50, hi:0, hi:15, hi:47, how:3, how:30, i:1, i:16, i:18, i:24, i:34, i:48, is:34, job.:42, ok.:58, right.:23, you:11, you:37, you:54, your:37

It might be help you.

answered Apr 05 '15 at 13:20

chandu kavar

411
1
4
13

actually i am new in map reducer so i am getting little bit difficulty on the the writing ap reduce code and running them ... – Shivendra Pandey Apr 05 '15 at 16:21
put above code in mapper and you just need to create empty reducer. No any code in reducer. you will get expected output. wish you all the best – chandu kavar Apr 05 '15 at 16:30
Hey Chandu, this helps me up to some extent but I am new in map-reduce programming and getting bit difficulty on implementing this piece of code in my full code and getting some error. Could you please let me know what are required libraries / header file or any .dll which I need to import or write in the mapper code. I am running this code without reducer. – Shivendra Pandey Apr 05 '15 at 16:43
Please find the new implemented code below: import java.io.IOException; import java.util.TreeMap; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapred.MRBench.Map; import org.apache.hadoop.mapred.MapReduceBase; import org.apache.hadoop.mapred.Mapper; import org.apache.hadoop.mapred.OutputCollector; import org.apache.hadoop.mapred.Reporter; import org.hsqldb.lib.HashMap; //do I need some more libraries or these are sufficient – Shivendra Pandey Apr 05 '15 at 16:45
Input file"::: hi how are you.hi how is your job............suppose this is a two line text file. So output for this file should be:::: are 7,hi 0,hi 15,how 3,how18,...every line should start with its "key" plus "index" value. (Next line shouldn't start with "0" index, It should continue with indexing of first line. Means second line should continue with the index value of 15 – Shivendra Pandey Apr 06 '15 at 04:07

score 1 · Accepted Answer · answered Apr 05 '15 at 17:47

Please run the below code, it is running fine and gives your expected output.

provide input and output path in command line arguments.(args[0], args[1])

import java.io.IOException;
import java.util.*;
import java.util.Map.Entry;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;


    public class IndexCount {

       public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> {
         public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {

           String str=value.toString();
           String[] tokens = str.split(" "); //split into words
           //create hashmap for unique word
           HashMap<String,Integer> uniqueString = new HashMap<String,Integer>();
           for(int i=0;i<tokens.length;i++){
               uniqueString.put(tokens[i],1);
           }       
           //for sorting create TreeMap from above hash map
           TreeMap<String, Integer> map = new TreeMap<String,Integer>(uniqueString); 
            for (Entry<String, Integer> entry : map.entrySet()) {
               int index=0;
           //find the index of the word
               index = str.indexOf((String)entry.getKey());
               while (index >= 0) {
                       output.collect(new Text((String)entry.getKey()),new IntWritable(index));
                       index = str.indexOf((String)entry.getKey(), index + 1);
               }
           }
       }
    }
       public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> {
         public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {

           while (values.hasNext()) {
               output.collect(key, new IntWritable(values.next().get()));
           }

         } 
    }
       public static void main(String[] args) throws Exception {
         JobConf conf = new JobConf(WordCount.class);
         conf.setJobName("indexfinder");

         conf.setOutputKeyClass(Text.class);
         conf.setOutputValueClass(IntWritable.class);
         conf.setMapperClass(Map.class);
         conf.setCombinerClass(Reduce.class);
         conf.setReducerClass(Reduce.class);    
         conf.setInputFormat(TextInputFormat.class);
         conf.setOutputFormat(TextOutputFormat.class);

         FileInputFormat.setInputPaths(conf, new Path(args[0]));
         FileOutputFormat.setOutputPath(conf, new Path(args[1]));

         JobClient.runJob(conf);
       }
    }

he is harry Gupta this is gupta harry hi harry how are you. hi Gupta how are you look good Gupta how if there is a line break then code starts from '0' again but index value shouldn't repeat from 0(zero ) again......it should take next index value in continuous form...this code giving for hi 0, his 0, this 0, hi 0,is 2,is 1,is 4,is 5 but every line should start with its previous character index value plus current character index. — Shivendra Pandey, Apr 05 '15 at 19:07
Hi, Please take small example. And write its expected output. so I can easily understand. Take two or three line example. — chandu kavar, Apr 05 '15 at 19:18
Input file"::: hi how are you.hi how is your job............suppose this is a two line text file. So output for this file should be:::: are 7,hi 0,hi 15,how 3,how18,...every line should start with its "key" plus "index" value. (Next line shouldn't start with "0" index, It should continue with indexing of first line. Means second line should continue with the index value of 15 — Shivendra Pandey, Apr 06 '15 at 04:07

score 1 · Answer 3 · answered Apr 06 '15 at 07:18

Please run the below code, its give expected output.

   import java.io.IOException;
    import java.util.*;
    import java.util.Map.Entry;

     import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.conf.*;
    import org.apache.hadoop.io.*;
    import org.apache.hadoop.mapreduce.*;
    import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
    import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
    import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
    import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

     public class Index {

      public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {


         public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
             String str=value.toString();
               String[] tokens = str.split(" "); //split into words
               //create hashmap for unique word
               HashMap<String,Integer> uniqueString = new HashMap<String,Integer>();
               for(int i=0;i<tokens.length;i++){
                   uniqueString.put(tokens[i],1);
               }       
               //for sorting create TreeMap from above hash map
               TreeMap<String, Integer> map = new TreeMap<String,Integer>(uniqueString); 
               Configuration conf=context.getConfiguration();
               int strIndex = 0;
                for (Entry<String, Integer> entry : map.entrySet()) {
                   //int index=0;
                    strIndex=conf.getInt("index", 0);
               //find the index of the word
                   int index = str.indexOf((String)entry.getKey());
                   while (index >= 0) {
                            index+=strIndex;
                           context.write(new Text((String)entry.getKey()),new IntWritable(index));
                           index = str.indexOf((String)entry.getKey(), index + 1);
                   }
               }
                conf.setInt("index", strIndex+str.length());
           }
      } 

  public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {

     public void reduce(Text key, Iterable<IntWritable> values, Context context) 
       throws IOException, InterruptedException {

         for (IntWritable val : values) {
             context.write(key, new IntWritable(val.get()));
        }
     }
  }

  public static void main(String[] args) throws Exception {
     Configuration conf = new Configuration();

        conf.setInt("index", 0);
         Job job = new Job(conf, "index");
     job.setOutputKeyClass(Text.class);
     job.setOutputValueClass(IntWritable.class);

     job.setMapperClass(Map.class);
     job.setReducerClass(Reduce.class);

     job.setInputFormatClass(TextInputFormat.class);
     job.setOutputFormatClass(TextOutputFormat.class);

     FileInputFormat.addInputPath(job, new Path("input"));
     FileOutputFormat.setOutputPath(job, new Path("output"));

     job.waitForCompletion(true);
  }

 }

in map-reduce programming can we write some user defined class? .... if we want to use link list data structer..can we do this? — Shivendra Pandey, Apr 06 '15 at 10:55
yes, we can do it. And please assign my answer as right answer. That will help me to build my profile well. — chandu kavar, Apr 06 '15 at 12:49
would you like please post any data structure example ..where we have created user-defined function and classes — Shivendra Pandey, Apr 06 '15 at 14:41
Shivendra, Please ask one question with detail requirement. Exactly what you want to do. Then it will help me to give proper answer with minimum communication. — chandu kavar, Apr 06 '15 at 16:25
I want to store words(string) in a hash table using hashing(linear probing) data structure.I know java, but doing map-reduce programming is bit difficult. — Shivendra Pandey, Apr 06 '15 at 17:12
can you please create new question? with example and explain the example. — chandu kavar, Apr 06 '15 at 17:33
I want to implement indexing in Hadoop using map-reduce prog.lets a text file:: how is your job. how is your family. then get the index value of every word and apply hash function (modules'%') to store in hash table if there is collision for same location the go to next and store it.how 0, how 14, is 3, is 18, job 12, your 7, apply hashing on this data with modules(number of distinct elements in file) let 4. and now store 0%4=0(store how at hash index 0) 14%4=2(store how at has index 2). 18%4=2(store is at hash index 3 because of collision).7%4=3(your at index 4 because of collision) — Shivendra Pandey, Apr 07 '15 at 03:12
I understood your problem. But how can I put answer of this question in your previous questions. So please raise new question, so the answer will be useful to others as well. Otherwise no one can map question with answer. No one read questions in comment. — chandu kavar, Apr 07 '15 at 04:05
Just now I have posted my question with full detail http://stackoverflow.com/questions/29476909/map-reduce-programming-in-data-structure please this ..... — Shivendra Pandey, Apr 07 '15 at 04:24
your question is on hold. Please make it clear. Change it, so anyone can understand. Dont say i want to implement data structure in map reduce. Just put your requirement, what and why.. — chandu kavar, Apr 07 '15 at 04:30
I am not able to edit that question..please If you understood please help me how to build a hash function for the above code thanks. — Shivendra Pandey, Apr 07 '15 at 04:51
I have posted my problem there with full of details, so plz check it... reply me: http://stackoverflow.com/questions/29486393/map-reduce-program-to-implement-data-structure-in-hadoop-framework — Shivendra Pandey, Apr 07 '15 at 07:45

Hadoop map-reduce mapper programming

3 Answers3