-2

I'm getting the following error at this line and I'm not sure why... it worked before but somewhere when debugging the code, broke... Any help? Not sure how much code is helpful to post, if this is not enough let me know and I'll update. Basically I'm just trying to extract all the links in this code into the same list from a previously jumbled list.

exceptions.TypeError: 'generator' object has no attribute '__getitem__'

        item['playerurl'] = re.findall(r'"[^"]*"',"".join(item['playerurl']))                                       #used to parse

Edit: item declaration in item file

class TeamStats(Item):
    # define the fields for your item here like:
    # name = scrapy.Field()

    team = Field()
    division = Field()
    rosterurl = Field()
    player_desc = Field()
    playerurl = Field()
    pass

I'll just post my entire code:

    ##the above code is for the real run but the below code is just for testing as it hits less pages 
    division = response.xpath('//div[@id="content"]//div[contains(@class, "mod-teams-list-medium")]')
    for team in response.xpath('//div[@id="content"]//div[contains(@class, "mod-teams-list-medium")]'):                              #goes through all teams in each division
        item = TeamStats()                                                                                  #creates new TeamStats item


        item['division'] = division.xpath('.//div[contains(@class, "mod-header")]/h4/text()').extract()[0]  #extracts the text which represents division, team and roster url
        item['team'] = team.xpath('.//h5/a/text()').extract()[0]
        item['rosterurl'] = "http://espn.go.com" + team.xpath('.//div/span[2]/a[3]/@href').extract()[0]

        request = scrapy.Request(item['rosterurl'], callback = self.parseWPNow)                             #opens up roster url to parse player data 
        request.meta['play'] = item

        yield request                                                                                       #run the request through parseWPNow





def parseWPNow(self, response):                                                                                 #after each request in parse, this is run

    item = response.meta['play']                                                                                #current item gets restored through meta tag
    item = self.parseRoster(item, response)                                                                     #goes through and takes basic player data while filling playerurl (needed for next step)                                                                    
    item = self.parsePlayer(item, response)                                                                     #gets player stats

    return item                                                                                                 #returns filled item object and on to next item

def parseRoster(self, item, response):
    players = Player()                                                                                          #creates player object to be filled
    int = 0
    for player in response.xpath("//td[@class='sortcell']"):                                                    #fills basic player stats in each player object
        players['name'] = player.xpath("a/text()").extract()[0]
        players['position'] = player.xpath("following-sibling::td[1]/text()").extract()[0]
        players['age'] = player.xpath("following-sibling::td[2]/text()").extract()[0]
        players['height'] = player.xpath("following-sibling::td[3]/text()").extract()[0]
        players['weight'] = player.xpath("following-sibling::td[4]/text()").extract()[0]
        players['college'] = player.xpath("following-sibling::td[5]/text()").extract()[0]
        players['salary'] = player.xpath("following-sibling::td[6]/text()").extract()[0]
        players['height'] = players['height']
        yield players
    item['playerurl'] = response.xpath("//td[@class='sortcell']/a").extract()                                   #playerurl is important for extracting the data info
    yield item

def parsePlayer(self,item,response):                                                                            

    item['playerurl'] = re.findall(r'"[^"]*"',"".join(item['playerurl']))                                       #used to parse
    for each in item['playerurl']:                                                                              #goes through each player in url and sets up requests1 to extract requests
        each = each[1:-1]
        each = each[:30]+"gamelog/"+each[30:]
        request1 = scrapy.Request(each, callback = self.parsePlayerNow)
        yield request1
user3042850
  • 323
  • 1
  • 3
  • 15

1 Answers1

1

It looks like item is not a dictionary. It's a generator instead.

You should check your logic and see where you are making item to be a generator.

Note that a generator is an object that works like a list comprehension. For example:

gen = (e for e in [1,2])
print type(gen)
# <generator object <genexpr> at 0x0000000001DB6E10>

And if you try the following:

gen[0]

you get the exception:

TypeError: 'generator' object has no attribute '__getitem__'

Edit: Yes, item is a generator. Your parsePlayer method is "returning" a generator (because of the yield statement). See this example:

def f():
    a = 1
    yield a + 1

print f()
# <generator object f at 0x0000000002A793A8>
Christian Tapia
  • 33,620
  • 7
  • 56
  • 73
  • i have declared it as an item, i'm pretty sure i'm not using it as a generator – user3042850 Jan 16 '15 at 15:45
  • Do you understand the difference between `yield` and `return`? – Håken Lid Jan 16 '15 at 15:47
  • @user3042850 as Haken Lid said, there is a difference between `yield` and `return`. – Christian Tapia Jan 16 '15 at 15:50
  • Not 100%, if you're pointing at 'yield item' under the parseRoster function, the reason i had it yield item because i wasn't able to mix return and yield under the same function... that's also why i was forced to create another function called parseWPNow rather than just returning item at the end of my initial forloop i understand that yield is used for generators so you can reuse a function with different input but i haven't yet mastered it – user3042850 Jan 16 '15 at 15:52
  • The problem actually is in `yield request1` in the `parsePlayer` method. – Christian Tapia Jan 16 '15 at 15:55
  • i'm using yield, to yield a different item (the players item), are you saying this is affecting my item item? – user3042850 Jan 16 '15 at 15:57
  • Yes. See the last example in @Christian's answer. – Håken Lid Jan 16 '15 at 15:58
  • You are doing this `item = self.parsePlayer(item, response) `. The `parsePlayer` is using `yield`, so it gives a `generator`. – Christian Tapia Jan 16 '15 at 15:59
  • In your code you are reassigning to `item` all the time. Dynamic languages such as python lets you do that, but can make debugging very hard. In this case, `item` suddenly is a generator, and it's hard to figure out exactly when it happened. It's not a good idea to reuse variable names too much. – Håken Lid Jan 16 '15 at 16:00
  • nice thanks guys, i added an empty array and passed it into parse roster, filled it with item['playerurl'] then passed this into parseplayer rather than using item – user3042850 Jan 16 '15 at 16:07
  • @user3042850 no problem. See you! – Christian Tapia Jan 16 '15 at 16:08