1

I have a running script using scrapy which takes data from the table. But it's saving in the format because original data is in row-argument order:

name 
firstitem
seconditem
...
lastitem

How can I save this dict in row format without 'name' like

21:00 2019/02/22, firstitem, seconditem,...,lastitem

I already have the list which contains the current time, so I need to rewrite this dict as a list to parse it into CSV.

EDIT I replaced dictionary's key with current_time argument, but problem with output format still exists.

import scrapy as sp
from time import gmtime, strftime

current_time = strftime("%Y-%m-%d %H:%M:%S", gmtime())

class tableSpider(sp.Spider):
    name='spider'
    start_urls = ['example.com'] #Cant expose real url

    def parse(self, response):
        CLASS_SELECTOR = '.col-xs-3'
        for ex in response.css(CLASS_SELECTOR):
            NAME_SELECTOR = 'a:not(.dep) ::text'
            yield {
                current_time: ex.css(NAME_SELECTOR).extract_first(),
            }

from scrapy.crawler import CrawlerProcess

c = CrawlerProcess({
    'USER_AGENT': 'Chrome/72.0.3626.119',
    'FEED_FORMAT': 'csv',
    'FEED_URI': 'booking.csv',
})
c.crawl(tableSpider)
c.start()

EDIT Target Html code with replaced values(I need the value of all 'Item'):

<div class="table-responsive catalog">
                <table class="table table-striped table-bordered">
                    <tr class="info">
                        <th class="text-center">#</th>
                        <th>table</th>
                        <th>description</th>
                    </tr>
                                    <tr>
                        <td class="text-center col-xs-1 text-valign">1</td>
                        <td class="col-xs-3">
                                                                                        <a href="scr" target="_blank">ITEM</a>
                                                        <br/>
                            <small>date</small>
                        </td>
                        <td class="col-xs-7 text-valign">adv</td>
                                            </tr>
                                    <tr style="color: #ffffff;background-color: #000000">
                        <td class="text-center col-xs-1 text-valign">2</td>
                        <td class="col-xs-3">
                            <a class="dep" href="scr" title="22">22</a>                                                            <a href="scr" target="_blank">ITEM</a>
                                                        <br/>
                            <small>date</small>
                        </td>
                        <td class="col-xs-7 text-valign">adv</td>
                                            </tr>
                                    <tr>
                        <td class="text-center col-xs-1 text-valign">3</td>
                        <td class="col-xs-3">
                                                                                        <a href="scr" target="_blank">ITEM</a>
                                                        <br/>
                            <small>date</small>
                        </td>
                        <td class="col-xs-7 text-valign">adv</td>
Rickoshet
  • 13
  • 4
  • 1
    Possible duplicate of [Write to a csv file scrapy](https://stackoverflow.com/questions/20719263/write-to-a-csv-file-scrapy) – efirvida Feb 22 '19 at 19:33
  • Can you add some example HTML with the real values replaced or removed? Your use of `col-xs-3` suggests it is not using `` but rather a Bootstrap (or similar) grid.
    – malberts Feb 23 '19 at 05:51
  • Sure, but i dont think this will help you with my problem, coz extracted values are correct just in wrong format. I edited question and added html bellow. – Rickoshet Feb 23 '19 at 06:14

1 Answers1

0

Item / ItemLoader mechanism serves your purpose. Something like:

Define an Item for data row:

class DataRowItem(scrapy.Item):
     current_time = scrapy.Field()
     firstitem = scrapy.Field()
     ...

Then declare the matching ItemLoader:

class DataRowItemLoader(scrapy.ItemLoader):
    default_item_class = DataRowItem
    default_output_processor = TakeFirst()

In the parse function:

def parse(self, response):
    loader = DataRowItemLoader(DataRowItem(), response=response)
    ... Extract the data here, using loader methods ...
    loader.add_css('current_time', ...)
    loader.add_css('firstitem', ...)
    ...
    yield loader.load_item()  # One item = one line

And then serialize the items in CSV using for example this method: Export csv file from scrapy (not via command line)

matthieu.cham
  • 501
  • 5
  • 17