21

I'm currently getting this error and don't know what is means. Its a scrapy python project, this is the error I'm seeing:

  File "/bp_scraper/bp_scraper/httpmiddleware.py", line 22, in from_crawler
    return cls(crawler.settings)
  File "/bp_scraper/bp_scraper/httpmiddleware.py", line 12, in __init__
    if parts[1]:
TypeError: '_sre.SRE_Match' object has no attribute '__getitem__'

The code:

import re
import random
import base64
from scrapy import log
class RandomProxy(object):
    def __init__(self, settings):
        self.proxy_list = settings.get('PROXY_LIST')
        f = open(self.proxy_list)

        self.proxies = {}
        for l in f.readlines():
            parts = re.match('(\w+://)(\w+:\w+@)?(.+)', l)

            if parts[1]:
                parts[1] = parts[1][:-1]

            self.proxies[parts[0] + parts[2]] = parts[1]

        f.close()
    @classmethod
    def from_crawler(cls, crawler):
        return cls(crawler.settings)

Thanks in advance for your help!

user3403945
  • 211
  • 1
  • 2
  • 3
  • 1
    `re.match` returns a [match object](http://docs.python.org/2/library/re.html#match-objects), which does not overload the `[]` operator. Did you mean `re.findall` instead? – Two-Bit Alchemist Mar 10 '14 at 23:51
  • 9
    with python 3.6 this should work (see [documentation](https://docs.python.org/3/library/re.html#re.match.__getitem__)) – ted Jun 09 '17 at 17:52

2 Answers2

21

The result of a re.match call is a SRE_Match object, which does not support the [] operator (a.k.a. __getitem__). I think you want

if parts is not None:
    if parts.group(1):
        <blah>

Unfortunately, parts.group(1) is not mutable, so you'll have to make another variable to hold the changes you want to make to it.

dg99
  • 5,456
  • 3
  • 37
  • 49
  • You can get rid of `is not None` part. `if parts` is same as `if parts is not None`. – shaktimaan Mar 11 '14 at 00:03
  • 2
    I choose to be as explicit as possible in my Python code. `if parts` is not exactly the same as `if parts is not None`; the former tests whether the current boolean interpretation of `parts` is `True`, while the latter explicitly tests whether `parts` has a value of `None`. In this example they would evaluate to the same thing, but I would rather not teach readers shortcuts that turn out not to work in some future case. – dg99 Mar 11 '14 at 00:08
  • 7
    This is no longer the case in Python 3.6, where you can use `[]`, see [docs](https://docs.python.org/3/whatsnew/3.6.html#re). – naktinis Nov 20 '17 at 13:08
8

You can not access the matched results as:

        if parts[1]:
            parts[1] = parts[1][:-1]

Instead do this,

        if parts:
            matched = parts.group(1)[:-1]

More on regex matched groups here

shaktimaan
  • 11,962
  • 2
  • 29
  • 33