-1

I'm trying to search r'CONTENTS\.\n+CHAPTER I\.' within a string from Gutenberg project, but I'm getting AttributeError, as it doesn't match, but the same pattern does match outside the function. My code is below:

def gutenberg(url):
    responce=request.urlopen(url)
    raw=responce.read().decode('utf8')
    print(re.search(r"CONTENTS\.\n+CHAPTER I\.",raw).group())

a=gutenberg("https://www.gutenberg.org/files/76/76-0.txt")

Output:

...
print(re.search(r"CONTENTS\.\n+CHAPTER I\.",raw).group())
AttributeError: 'NoneType' object has no attribute 'group'

And outside the function:

a="""Complete









 CONTENTS.



 CHAPTER I. Civilizing"""

re.search(r"CONTENTS\.\n+CHAPTER I\.",a).group()

Output:

'CONTENTS.\n\nCHAPTER I.'

Though, it works fine within the function when there's no new line character in the pattern: print(re.search(r"CONTENTS\.",raw).group()). So, I believe I need something like flags.

What I've tried:

print(re.search(r"CONTENTS\.\n+CHAPTER I\.",raw,re.M).group())

  pattern=re.compile(r'CONTENTS.\n+CHAPTER I.')
  print(pattern.search(raw).group())

I even tried to add a backslash into my pattern: r"CONTENTS\.\\n+CHAPTER I\." - the same AttributeError.

I read about flags=regex.VERSION1 here but I couldn't find information about it in the last Python's regex guide, so I haven't tried to use it.

Any ideas how to search for multiline pattern within a function?

In general, what's confusing me much is different behavior of re.search() inside and outside the function. Is there a conception I'm not aware of?

Thanks in advance! I'll appreciate any help!

Elena
  • 65
  • 3

1 Answers1

0

No, there isn't something special, and it doesn't matter whether you're "in a function" or not. The data you pulled down from that URL simply doesn't match your pattern: it has \r\n line endings and not \n. Your "outside the function" test case with the literal string is testing on different data which does match the pattern.

hobbs
  • 223,387
  • 19
  • 210
  • 288