0

I have a huge text file which contains text data . Files' each line contains 12 character of data. I need to find a substring of 5 character from that file using map reduce job.

Input file.

abcdefghijkl
kahfdjshjsdh
sdfkjsdjkjks

value to search

cdefg

The 'cdefg' can occurs anywhere in the file. It can be in in two lines. So I don't know how to create a map of last two character of current line and next 3 character of next line.

Ibrar Ahmed
  • 1,039
  • 1
  • 13
  • 25

1 Answers1

0

I have a file containing lines of 12 characters and I want to find 5 character of string from that file. In Mapper I am getting 12 character of line and can create two maps of 5 character and left 2 character and want to get next 3 character from next line and want to create map of it. So in reducer I can compare that maps with my string.

You can concatenate your line all together then you can split the result with 5 character check this Splitting a string at every n-th character :

abcdefghijklkahfdjshjsdhsdfkjsdjkjks
[abcde, fghij, klkah, fdjsh, jsdhs, dfkjs, djkjk, s]

You can inspire the solution from this piece of code :

File file = new File("myFile.txt");
try {
    Scanner scanner = new Scanner(file);
    String result = "";
    while (scanner.hasNextLine()) {
        String line = scanner.nextLine();
        result += line;
    }
    System.out.println(result);
    //here you can use this array
    String spl[] = result.split("(?<=\\G.....)");

    System.out.println(Arrays.toString(spl));
} catch (FileNotFoundException e) {
    e.printStackTrace();
}

Output

abcdefghijklkahfdjshjsdhsdfkjsdjkjks
[abcde, fghij, klkah, fdjsh, jsdhs, dfkjs, djkjk, s]

EDIT

I Want to create map like this abcdefghijklkahfdjshjsdhsdfkjsdjkjks [abcde, bcdef, cdefg, defgh... ]

You can solve this problem like so :

String str = "abcdefghijklkahfdjshjsdhsdfkjsdjkjks";
List<String> list = new ArrayList<>();

for (int i = 0; i < str.length()-4; i++) {
    String s = "";
    for (int j = i; j < i+5; j++) {
        s+=result.charAt(j);
    }
    list.add(s);
}

Output

[abcde, bcdef, cdefg, defgh, efghi, fghij, ghijk, ...., djkjk, jkjks]
Community
  • 1
  • 1
Youcef LAIDANI
  • 55,661
  • 15
  • 90
  • 140