-1

I'm trying to identify all instances of a specific syntactic pattern found in a text: RB + NN|NNS|NP|PP. That is to say, I'm looking for adverbs that are immediately followed by nouns. I've tagged my text using TreeTagger. The tagged text is stored in a list called 'tags' that looks like this:

    how  WRB
    hard JJ
    it   PP
    was  VBD

This is the relevant part of my code:

adverb = re.compile(r'RB$')
noun = re.compile(r'NN')
for n in range(len(tags)):                                                                                                                          
    w = tags[n]
    if adverb.search(w) != None and noun.search(w[n+1]) != None:
        print(' '.join(tags[n-2 : n+3]))

My problem is that the fifth line produces the following error:

     if adverb.search(w) != None and noun.search(w[n+1]) != None:
     IndexError: string index out of range

If the fourth line of code is this...

     if adverb.search(w) != None:

...then a list of adverbs is returned.

I'm really lost as to 1) why I am getting this mistake and 2) how I can fix it. Any guidance you guys can offer would be super appreciated.

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
Gabriel
  • 47
  • 9
  • 2
    Surely `w[n+1]` is the most likely source of your index error. What guarantee do you have that `n+1` is less than the length of your string `w` ? – khelwood May 24 '16 at 13:19
  • I thought I was asking Python to look at the word after the adverb and check to see whether it's a noun? Is that not what I'm doing? – Gabriel May 24 '16 at 13:23
  • `w[n+1]` is not the word after anything. If `w` is a word, then `w[n+1]` is a letter in that word. – khelwood May 24 '16 at 13:28
  • I have no idea why you would be trying to access `w[n+1]`. It does not make any sense to me. My advice would be that you should think about what that piece of code is supposed to do and then write that, instead of what you currently have. – khelwood May 24 '16 at 13:28
  • 1
    I'm trying to access the next item after an adverb to see whether it is a noun. – Gabriel May 24 '16 at 13:31
  • 1
    OK, so which variable is your list? If your list is the variable `tags`, then you'd access it via `tags[...]`, not `w[...]`. And you would need to make sure that the index you are using is inside the range of the list. – khelwood May 24 '16 at 13:33
  • That was it! I changed replaced 'w' with 'tags' (my list) and it now works! More importantly, I also understand why I made that mistake. Can you post your answer so that I can give you credit? – Gabriel May 24 '16 at 13:39
  • OK. Glad we could figure it out. – khelwood May 24 '16 at 13:46

1 Answers1

0

Your problem is this:

w[n+1]

You are confusing your list tags with a string in that list, w. If you want to access another item in the list, you need to use tags[...], not w[...]. Also, you should make sure that the index you are using is inside the range of the list.

khelwood
  • 55,782
  • 14
  • 81
  • 108