0

My list is :

search=[1രാമന്‍,2സീതയെ,7പൂവ്‌,16കോട്ടയത്ത്‌,22പരീക്ഷ,28രാമന്‍,29ലക്ഷ്മനനെ,33രാമനോടു,36ലക്ഷ്മണന്‍,37സീതയെ,45വഴ]

My inputfile contains:

1രാമന്‍ N_NNP_S_M_SG 1    
2സീതയെ N_NNP_O_F_SG 1  
4. RD_PUNC 0  
1രാമന്‍,5അവന്‍ PR_PRP_S_M_SG 1  
2സീതയെ,6അവള്‍ക്ക് PR_PRP_O_F_SG 1  
7പൂവ്‌ N_NN_O_NU_SG 1  
9. RD_PUNC 0  
2സീതയെ,6അവള്‍ക്ക്,10അവള്‍ PR_PRP_S_F_SG 2  
7പൂവ്‌,11അത്‌ DM_DMD 1  
13. RD_PUNC 0  
2സീതയെ,1രാമന്‍,14അവര്‍ PR_PRP_S_PL 3  
16കോട്ടയത്ത്‌ N_NST_O_SG 2  
18. RD_PUNC 0  
16കോട്ടയത്ത്‌,19അവിടെ N_NST_NU 5  
2സീതയെ,1രാമന്‍,21അവര്‍ക്ക്‌ PR_PRP_S_PL 6  
22പരീക്ഷ N_NN_O_NU 4  
25. RD_PUNC 0  
16കോട്ടയത്ത്‌,19അവിടെ,26അവിടെ N_NST_NU 11  
28രാമന്‍ N_NNP_S_M_SG 11  
29ലക്ഷ്മനനെ N_NNP_O_M_SG 9  
31. RD_PUNC 0  
28രാമന്‍,32അവന്‍ PR_PRP_S_M_SG 33  
33രാമനോടു N_NN_O_M 18  
35. RD_PUNC 0  
36ലക്ഷ്മണന്‍ N_NNP_S_M_SG 45  
37സീതയെ N_NNP_O_F_SG 37  
39. RD_PUNC 0  
36ലക്ഷ്മണന്‍,40അവനെ PR_PRP_S_M_SG 135  
37സീതയെ,41അവള്‍ക്ക്‌ PR_PRP_O_F_SG 112  
43. RD_PUNC 0  
45വഴ,44ഈ DM 100  
45വഴ N_NN_O_NU 150  
37സീതയെ,36ലക്ഷ്മണന്‍,47അവര്‍ PR_PRP_S_PL 262  
50. RD_PUNC 0  

I need the inital position and final position of each item in list and by doing this to split the inputfile.

my expected output is:

1രാമന്‍ 1 to 25  
28രാമന്‍ 26 to 35  
36ലക്ഷ്മണന്‍ 36 to 50  

I dont want to search 2സീതയെ,7പൂവ്‌,16കോട്ടയത്ത്‌,22പരീക്ഷ. Because the integer is less than 25. Then search for 28രാമന്‍ and so on.

My code is

import unicodedata  
import codecs      
import string    
import re      


fr = codecs.open('outputfile5.txt', encoding='utf-8')  
lines = fr.readlines()
t=0

for item in search:

    for i in range(0+t,len(lines)):
         if item in lines[i]:
             line1=lines[i]
         if line1:
             x=line1.split(",")
         lineno=[]
         for y in x:
                s = re.match('([0-9]+)', y).group(1)
                print int(s), y[len(s):]
                lineno.append(int(s))
        lineno.sort()
        initial=lineno[0]
        t=i
        for k in range(t,len(lines)):
            punc=lines[k].split()
            q=punc[0]

        b = re.match('([0-9]+)', q).group(1)
        print int(b), i[len(b):]
        if (i[len(b):] =="."):      
            final=int(b)
            break       
    print "Start",initial   
    print "Final",final



fr.close()

Error occured and stucked. The error is

s = re.match('([0-9]+)', y).group(1)
AttributeError: 'NoneType' object has no attribute 'group'
user3251664
  • 441
  • 2
  • 7
  • 11
  • ...and what is the actual output? – Burhan Khalid Feb 26 '14 at 06:07
  • Edited the post.. please have a look.. I am stucked due to the error shown above. – user3251664 Feb 26 '14 at 06:22
  • First of all you need to fix your indentation (ex. lineno.append(int(s)) -- where is this line supposed to be indented?) Second of all, are you sure your code reaches the line you say throws exception? Because I don't see where you declare 'search'(line 5). Also include any imports used in this example, like codecs, re, etc... – skamsie Feb 26 '14 at 06:47
  • @HerrActress ``search`` was declared in the very first code block. – tsroten Feb 26 '14 at 08:07
  • What is `y`? Can you post that? it may have no number in the beginning. – User Feb 26 '14 at 14:00
  • @User y represents each item in x. For eg: if line1=2സീതയെ,1രാമന്‍,21അവര്‍ക്ക്‌ then x =[2സീതയെ,1രാമന്‍,21അവര്‍ക്ക്‌ ] and y represents each item in x ie. 2സീതയെ then 1രാമന്‍ and last 21അവര്‍ക്ക്‌. – user3251664 Feb 28 '14 at 04:43
  • I know that but what is the y that gives the error? From my perspective it is possible that this is an empty line or does not start with numbers but with other characters. This can happen if you use Mac and open a windows created text file. – User Feb 28 '14 at 09:58

1 Answers1

0

Most probably re.match('([0-9]+)', y) did not yield any result so there is no group(1) to get. You seem to need re.search (matching part of y) rather than re.match (matching exactly y) for your purpose. In any case, you could test the matching result before accessing the group.

temp = re.match('([0-9]+)', y)
if temp:
    s=temp.group(1)
GAM PUB
  • 218
  • 4
  • 11