0

I am trying to code one MapReduce scenario in which i have created some User ClickStream data in the form of JSON. After that i have written Mapper class to fetch the required data from the file my mapper code is :-

private final static String URL = "u";

private final static String Country_Code = "c";

private final static String Known_User = "nk";

private final static String Session_Start_time = "hc";

private final static String User_Id = "user";

private final static String Event_Id = "event";

public void map(LongWritable key, Text value, Context context)
        throws IOException, InterruptedException {
    String aJSONRecord = value.toString();
    try {
        JSONObject aJSONObject = new JSONObject(aJSONRecord);
        StringBuilder aOutputString = new StringBuilder();
        aOutputString.append(aJSONObject.get(User_Id).toString()+",");
        aOutputString.append(aJSONObject.get(Event_Id).toString()+",");
        aOutputString.append(aJSONObject.get(URL).toString()+",");
        aOutputString.append(aJSONObject.get(Known_User)+",");
        aOutputString.append(aJSONObject.get(Session_Start_time)+",");
        aOutputString.append(aJSONObject.get(Country_Code)+",");
        context.write(new Text(aOutputString.toString()), key);
        System.out.println(aOutputString.toString());
    } catch (JSONException e) {
        e.printStackTrace();
    }
}

}

And my reducer code is :-

public void reduce(Text key, Iterable<LongWritable> values,
        Context context) throws IOException, InterruptedException {
        String aString =  key.toString();
        context.write(new Text(aString.trim()), new Text(""));  

}

And my partitioner code is :-

public int getPartition(Text key, LongWritable value, int numPartitions) {
    String aRecord = key.toString();
    if(aRecord.contains(Country_code_Us)){
        return 0;
    }else{
        return 1;
    }
}

And here is my driver code

public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();
    Job job = Job.getInstance(conf, "Click Stream Analyzer");
    job.setNumReduceTasks(2);
    job.setJarByClass(ClickStreamDriver.class);
    job.setMapperClass(ClickStreamMapper.class);
    job.setReducerClass(ClickStreamReducer.class);
    job.setPartitionerClass(ClickStreamPartitioner.class);
    job.setMapOutputKeyClass(Text.class);
    job.setMapOutputValueClass(LongWritable.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(Text.class);
    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));
    System.exit(job.waitForCompletion(true) ? 0 : 1);

}

Here i am trying to partition my data on the basis of country code. But its not working, it is sending each and every record in a single reducer file i think file other then the one created for US reduce.

One more thing when i see the output of mappers it shows some extra space added at the end of each record.

Please suggest if i am making any mistake here.

rraghuva
  • 131
  • 1
  • 10
  • What is `Country_code_Us`? – Ben Watson Nov 25 '15 at 10:03
  • Country_code_us = "US"; – rraghuva Nov 25 '15 at 10:25
  • And you're sure that your input data has `US` in it somewhere? It's kind of unrelated, but I'm not sure why you're outputting the `LongWritable` from the mapper. Output `NullWritable.get()` and set the value output format to `NulWritable.class`. Do the same for the reducer value. Doing `new Text("")` for every key is going to be a big resource drain! – Ben Watson Nov 25 '15 at 10:34
  • Yes i have US there in my records. – rraghuva Nov 25 '15 at 10:40
  • Can you add a sysout into the `getPartition` method to confirm whether it's actually hitting it? – Ben Watson Nov 25 '15 at 10:43
  • Hi Ben , I have used NullWritable and it works. Now i can see records are getting partitioned in different files. Thanks for help :-) – rraghuva Nov 25 '15 at 10:48
  • Well that was unexpected... I was just suggesting that for efficiency and clarity - I didn't expect it to solve your issue! – Ben Watson Nov 25 '15 at 10:49

3 Answers3

0

Your problem with the partitioning is due to the number of reducers. If it is 1, all your data will be sent to it, independently to what you return from your partitioner. Thus, setting mapred.reduce.tasks to 2 will solve this issue. Or you can simply write:

job.setNumReduceTasks(2);

In order to have 2 reducers as you want.

mgaido
  • 2,987
  • 3
  • 17
  • 39
0

Unless you have very specific requirement, you can set reducers as below for job parameters.

mapred.reduce.tasks (in 1.x) & mapreduce.job.reduces(2.x)

Or

job.setNumReduceTasks(2) as per mark91 answer.

But leave the job to Hadoop fraemork by using below API. Framework will decide number of reducers as per the file & block sizes.

job.setPartitionerClass(HashPartitioner.class);
Ravindra babu
  • 37,698
  • 11
  • 250
  • 211
0

I have used NullWritable and it works. Now i can see records are getting partitioned in different files. Since i was using longwritable as a null value instead of null writable , space is added in the last of each line and due to this US was listed as "US " and partition was not able to divide the orders.

rraghuva
  • 131
  • 1
  • 10