1

I have several texts and I want to know the line number and the file where appears a word.

I got the file well but not the line number.

This is the map

#!/usr/bin/env python

import sys
import os

find = 'but' #word to find
linesCont = 0

file = os.environ["map_input_file"]

for line in sys.stdin:
    line = line.strip()
    words = line.split()
    linesCont = linesCont + 1;  
    for word in words:
        if (word in (find)):        
            print '%s\t%s' % (file, linesCont)

And it is the reduce

#!/usr/bin/env python
import sys

result = {}

for line in sys.stdin:
    line = line.strip()

    file, lineNumber = line.split('\t', 1)

    try:
        result[file] = result[file] + ', ' + lineNumber
    except:
        result[file] = 'File "%s". LineNumber(s): %s' % (file, lineNumber)

for file in result.keys():
    print '%s\t' % (result[file])

Thanks a lot in advance

Carlos S
  • 13
  • 3
  • What do you get then? `1`? – Ofir Israel Oct 12 '13 at 12:39
  • possible duplicate of [Get Line number in map method using FileInputFormat](http://stackoverflow.com/questions/15543827/get-line-number-in-map-method-using-fileinputformat) – Praveen Sripati Oct 12 '13 at 12:44
  • I get a number but it's not the real line within a file, because the mapper get pieces of several files and not in order... so... the number it's not the line number of a file – Carlos S Oct 14 '13 at 09:04

1 Answers1

1

Here is a discussion on the same in the Apache forums. Another query in SO. And here is a code snippet to get the file name of the block being processed.

Community
  • 1
  • 1
Praveen Sripati
  • 32,799
  • 16
  • 80
  • 117
  • Thanks for the answer. But i don't kwon how begin... The input files are files txt's. I don't mind to change to Java but I haven't found samples and I can't achieve to see the light hehe – Carlos S Oct 14 '13 at 09:16