Scrapy: extracting info from a specific div

Question

This is my code:

def parse(self, response):
    return scrapy.FormRequest.from_response(
        response,
        formdata={'uuid': 'user', 'password': 'cons'},
        callback=self.after_login
    )

def after_login(self, response):
    # check login succeed before going on
    if "authentication failed" in response.body:
        self.log("Login failed", level=log.ERROR)
else:
    self.log('LOGGED')  
    sel = Selector(response)
    sel.xpath("//div[@class='amount cSpringGreen']/text()").extract()

But nothing appears when I execute it. The way it should work is after login in a website show that information. The html code is this.

<h1 class="hide2"></h1>
<div id="vodaint-local" class="wrapper rhomb">
<div class="spring">
<script type="text/javascript">
<div class="mod mod-selectsizeheader vodaint-local">
<div id="mivf" class="content">
<div id="navigation-breadcrumb" class="belt">
<div class="belt">
<div class="miVFR">
<div class="mainMiVF cf">
<div class="headerMiVF cf">
<div class="bodyMiVF cf">
<div class="mainNav" style="height: auto;">
<div class="mainContent withHeader" style="height: 585px;">
<style>
<div id="contentSpinner" style="margin-bottom: 432px; display: none;">
<script>
<section>
<script type="text/javascript">
<div class="mainContentContainer home">
<div class="headerBanner">
<script type="text/javascript">
<div class="lineContainer ">
<h6 class="topHeading prepago"> </h6>
<div class="columnGroup cf">
<div class="column newPromo">
<div class="columnContent">
<p class="cTitle"> Tu saldo</p>
--THIS IS THE INFO I WANT TO SHOW--
<div class="amount cSpringGreen">
0,
<span> 96</span>
€
</div>

Thanks!

EDIT: in this pastebin you can find the whole HTML file http://pastebin.com/B2HpACCw the thing that I want to show after the login is "0'96", THANKS!

The HTML is a little bit weird. There are lots of not closed `div` and `script`. And even a suspicious `style` element. — dreyescat, Nov 22 '14 at 19:15
I'm not sure if I understand the exact problem -- is any of the messages on the ``after_login`` method being printed at all? What's the problem exactly: the spider isn't logging in on the site or it is but the data is not being scraped? — Elias Dorneles, Nov 22 '14 at 19:29
The login works perfectly, it shows LOGGED on the screen, the problem is after that it doesn't show anything. — AngelaBR, Nov 23 '14 at 20:18
I just ran xpath with the html that you have posted and it works. can you post the real url? then we can test your whole code — Nima Soroush, Nov 23 '14 at 23:30
I can't post the real URL as it's a page qhere you enter after a login and it shows personal information, but im goning to edit my question with the complete html code — AngelaBR, Nov 26 '14 at 11:05

Tushar Gupta · Answer 1 · 2014-11-26T11:38:07.990

0

Store it into item edit; items.py

class TestItem(scrapy.Item):
    text= scrapy.Field()

and then in spider as

item=TestItem()
item['text'] = sel.xpath("//div[@class='amount cSpringGreen']/text()").extract()
print item['text']

edited Nov 26 '14 at 11:38

answered Nov 26 '14 at 11:20

Tushar Gupta

15,504
1
29
47

I had to modify desc = scrapy.Field() for text. = scrapy.Field(). After that it prompts this. 2014-11-26 12:35:48+0100 [login] DEBUG: LOGGED []. The item is empty. – AngelaBR Nov 26 '14 at 11:37
Sorry for the discrepancy, I just forgot to update... Now you have to include the item in items.py and then access it from your spider – Tushar Gupta Nov 26 '14 at 11:39

Scrapy: extracting info from a specific div

1 Answers1