I'm new in mapreduce I'm trying to do a join of 2 different type of lines from two different csv files.
The map is ok, I load the two files A and B, I match the lines that I want with the same key.
In the reducer I am having a very strange behavior which I can not understand. Lines from A start with accident#
and lines from B start with meteo#
. I want to identify if a line is from A or B and then get the rest of the line, but when I am testing this code
for(Text val : values){
StringTokenizer line = new StringTokenizer(val.toString(), "#");
String comparable = line.nextToken();
context.write(key,new Text(comparable));
}
I receive the following output, which is ok
2015-12-31;X meteo
2015-12-31;X accident
2015-12-31;X accident
2015-12-31;X accident
2015-12-31;X accident
Then I do this
for(Text val : values){
StringTokenizer line = new StringTokenizer(val.toString(), "#");
String comparable = line.nextToken();
if (comparable.equals("meteo"))
comparable = line.nextToken();
context.write(key,new Text(comparable));
}
2015-12-31;X ;17.8;14:00;9.1;04:40;25;12:20;19;19:00;0;0;0
2015-12-31;X accident
2015-12-31;X accident
2015-12-31;X accident
2015-12-31;X accident
which is also ok. Then I do the following thing to store the meteo
String meteo;
for(Text val : values){
meteo = "hi";
StringTokenizer line = new StringTokenizer(val.toString(), "#");
String comparable = line.nextToken();
if (comparable.equals("meteo"))
meteo = line.nextToken();
context.write(key,new Text(meteo));
}
2015-12-31;X hi
2015-12-31;X hi
2015-12-31;X hi
2015-12-31;X hi
2015-12-31;X hi
when the expected result was
2015-12-31;X ;17.8;14:00;9.1;04:40;25;12:20;19;19:00;0;0;0
2015-12-31;X hi
2015-12-31;X hi
2015-12-31;X hi
2015-12-31;X hi
This is a simplification of my problem but it shows a very strange behavior. Actually what I want is to append the meteo line to every accident line with the same key, this is my final objective, but if this does not work... I do not know how can I do that (My idea is get the meteo line, store it and then append it to every accident line)
EDIT
Next, I'm going to add the code of the mapper and the exact input, to clarify the problem
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
StringTokenizer lines = new StringTokenizer(value.toString(), "\n");
while (lines.hasMoreTokens()){
StringTokenizer line = new StringTokenizer(lines.nextToken(),";");
String csvLine = new String(); //this will be the output value
String atr = line.next.Token(); //with the first atribute i will diferenciate between meteo and accidents
boolean isMeteo = false;
if(atr.equals("0201X")) isMeteo=true;
if(!isMeteo){ //if is a accident line, I search the atributs to put the date in the key (i==6,7,8)
int i=1;
csvLine=atr;
while(line.hasMoreTokens()){
String aux= line.nextToken();
csvLine+=";"+aux;
if(i==6) id =aux;
else if(i==7 || i==8){
int x = Integer.parseInt(aux);
if(x<10)aux = "0"+aux;
id+="-"+aux;
}
else if(i==13){ //this is the X in the key, that is for identify the meteo station (this is not important in my problem)
aux = aux.substring(0,aux.length()-1);
id+=";"+aux;
csvLine= csvLine.substring(0,csvLine.length()-1);
}
++i;
}
}
else if(isMeteo){
id = line.nextToken(); //in the second column we have the complete date string
id+=";X"; //this file has the data of the meteo station X
csvLine+=";"+toCsvLine(line);
}
Text outKey = new Text(id);
Text ouyKey = new Text(csvLine);
context.write(outKey,outValue);
}
public String toCsvLine(StringTokenizer st){
String x = new String();
x = st.nextToken();
while(st.hasMoreTokens()){
x+=";"+st.nextToken();
}
return x;
}
In the accidents file, I take the columns to make the day ID (year-month-day), and in the meteo file I only take the column with all the date for the id. In csvLine I have the csv line that I want. Then I write the key(id) and the value(csvLine).
And here we have the input data (only 2 days, for a representative example):
meteoX.csv :
0201X;2015-12-30;18.6;14:50;12.2;07:00;;26;13:20;17;13:10;;0;;;
0201X;2015-12-31;17.8;14:00;9.1;04:40;;25;12:20;19;19:00;;;0;0;0
accidents.csv :
2015S009983;Ciutat Vella;la Barceloneta;Mar;Dc;Laboral;2015;12;30;22;Altres;4581220,92;432258,31;X
2015S009984;Sant Mart�;Sant Mart� de Proven�als;Cant�bria;Dc;Laboral;2015;12;30;20;Col.lisi� fronto-lateral;4585862,62;433330,95;X
2015S009985;Eixample;la Nova Esquerra de l'Eixample;Cal�bria;Dj;Laboral;2015;12;31;00;Caiguda (dues rodes);4582094,15;428800,57;X
2015S009987;Eixample;la Dreta de l'Eixample;Gr�cia;Dj;Laboral;2015;12;31;02;Col.lisi� lateral;4582944,96;430133,41;X
2015S009988;Eixample;la Nova Esquerra de l'Eixample;Arag�;Dj;Laboral;2015;12;31;07;Abast;4581873,45;429312,63;X
2015S009989;Ciutat Vella;la Barceloneta;Mar�tim de la Barceloneta;Dj;Laboral;2015;12;31;08;Abast;4581518,06;432606,87;X