3

I'm new in mapreduce I'm trying to do a join of 2 different type of lines from two different csv files.

The map is ok, I load the two files A and B, I match the lines that I want with the same key.

In the reducer I am having a very strange behavior which I can not understand. Lines from A start with accident# and lines from B start with meteo#. I want to identify if a line is from A or B and then get the rest of the line, but when I am testing this code

        for(Text val : values){
            StringTokenizer line = new StringTokenizer(val.toString(), "#");
            String comparable = line.nextToken();
            context.write(key,new Text(comparable));
        } 

I receive the following output, which is ok

2015-12-31;X    meteo
2015-12-31;X    accident
2015-12-31;X    accident
2015-12-31;X    accident
2015-12-31;X    accident

Then I do this

        for(Text val : values){
            StringTokenizer line = new StringTokenizer(val.toString(), "#");
            String comparable = line.nextToken();
            if (comparable.equals("meteo"))
                comparable = line.nextToken();
            context.write(key,new Text(comparable));
        }
2015-12-31;X    ;17.8;14:00;9.1;04:40;25;12:20;19;19:00;0;0;0
2015-12-31;X    accident
2015-12-31;X    accident
2015-12-31;X    accident
2015-12-31;X    accident

which is also ok. Then I do the following thing to store the meteo

        String meteo;
        for(Text val : values){
            meteo = "hi";
            StringTokenizer line = new StringTokenizer(val.toString(), "#");
            String comparable = line.nextToken();
            if (comparable.equals("meteo"))
                meteo = line.nextToken();
            context.write(key,new Text(meteo));
        } 
2015-12-31;X    hi
2015-12-31;X    hi
2015-12-31;X    hi
2015-12-31;X    hi
2015-12-31;X    hi

when the expected result was

2015-12-31;X    ;17.8;14:00;9.1;04:40;25;12:20;19;19:00;0;0;0
2015-12-31;X    hi
2015-12-31;X    hi
2015-12-31;X    hi
2015-12-31;X    hi

This is a simplification of my problem but it shows a very strange behavior. Actually what I want is to append the meteo line to every accident line with the same key, this is my final objective, but if this does not work... I do not know how can I do that (My idea is get the meteo line, store it and then append it to every accident line)

EDIT

Next, I'm going to add the code of the mapper and the exact input, to clarify the problem

 public void map(Object key, Text value, Context context
                 ) throws IOException, InterruptedException { 

   StringTokenizer lines = new StringTokenizer(value.toString(), "\n");
   while (lines.hasMoreTokens()){
        StringTokenizer line = new StringTokenizer(lines.nextToken(),";");
        String csvLine = new String(); //this will be the output value
        String atr = line.next.Token(); //with the first atribute i will diferenciate between meteo and accidents
        boolean isMeteo = false;
        if(atr.equals("0201X")) isMeteo=true; 
        if(!isMeteo){  //if is a accident line, I search the atributs to put the date in the key (i==6,7,8)
                  int i=1;
                  csvLine=atr;
                  while(line.hasMoreTokens()){
                      String aux= line.nextToken();
                      csvLine+=";"+aux;
                      if(i==6) id =aux;
                      else if(i==7 || i==8){
                          int x = Integer.parseInt(aux);
                          if(x<10)aux = "0"+aux;
                          id+="-"+aux;
                      }
                      else if(i==13){ //this is the X in the key, that is for identify the meteo station (this is not important in my problem)
                          aux = aux.substring(0,aux.length()-1);
                          id+=";"+aux;
                          csvLine= csvLine.substring(0,csvLine.length()-1);
                      }
                      ++i;
                  }
        }
        else if(isMeteo){
            id = line.nextToken(); //in the second column we have the complete date string
            id+=";X";  //this file has the data of the meteo station X
            csvLine+=";"+toCsvLine(line);
        }
        Text outKey = new Text(id);
        Text ouyKey = new Text(csvLine);
        context.write(outKey,outValue);
 }

 public String toCsvLine(StringTokenizer st){
     String x = new String();
     x = st.nextToken();
     while(st.hasMoreTokens()){
         x+=";"+st.nextToken();
     }
     return x;
 }      

In the accidents file, I take the columns to make the day ID (year-month-day), and in the meteo file I only take the column with all the date for the id. In csvLine I have the csv line that I want. Then I write the key(id) and the value(csvLine).

And here we have the input data (only 2 days, for a representative example):

meteoX.csv :

 0201X;2015-12-30;18.6;14:50;12.2;07:00;;26;13:20;17;13:10;;0;;;
 0201X;2015-12-31;17.8;14:00;9.1;04:40;;25;12:20;19;19:00;;;0;0;0

accidents.csv :

 2015S009983;Ciutat Vella;la Barceloneta;Mar;Dc;Laboral;2015;12;30;22;Altres;4581220,92;432258,31;X 
 2015S009984;Sant Mart�;Sant Mart� de Proven�als;Cant�bria;Dc;Laboral;2015;12;30;20;Col.lisi� fronto-lateral;4585862,62;433330,95;X 
 2015S009985;Eixample;la Nova Esquerra de l'Eixample;Cal�bria;Dj;Laboral;2015;12;31;00;Caiguda (dues rodes);4582094,15;428800,57;X  
 2015S009987;Eixample;la Dreta de l'Eixample;Gr�cia;Dj;Laboral;2015;12;31;02;Col.lisi� lateral;4582944,96;430133,41;X   
 2015S009988;Eixample;la Nova Esquerra de l'Eixample;Arag�;Dj;Laboral;2015;12;31;07;Abast;4581873,45;429312,63;X    
 2015S009989;Ciutat Vella;la Barceloneta;Mar�tim de la Barceloneta;Dj;Laboral;2015;12;31;08;Abast;4581518,06;432606,87;X    
oriolfm14
  • 31
  • 3
  • 2
    Could you include the `meteo` `String`. Is it a correct behavior, that in case of finding a token that equals `meteo` the next token should be inlcuded? – SomeJavaGuy Jun 09 '16 at 14:13
  • What I want is, when the String comparable is "meteo", save the value of the next Token. I want to save this next Token because I will use it after this for. – oriolfm14 Jun 09 '16 at 14:55
  • 2
    I ran your code and got your expected result. I had to guess what the input looked like, that is the main piece of information missing here. Perhaps add the input text that corresponds with the expected output. Other than that your code looks correct. – Binary Nerd Jun 10 '16 at 10:23
  • I've just added my mapper function and the input lines. – oriolfm14 Jun 13 '16 at 12:56

0 Answers0