Write to a csv file scrapy

Question

I want to write to csv file in scrapy

 for rss in rsslinks:
  item = AppleItem()
  item['reference_link'] = response.url
  base_url = get_base_url(response)
  item['rss_link'] = urljoin_rfc(base_url,rss)
  #item['rss_link'] = rss
  items.append(item)
  #items.append("\n")
 f = open(filename,'a+')    #filename is apple.com.csv
 for item in items:
    f.write("%s\n" % item)

My output is this:

{'reference_link': 'http://www.apple.com/'
 'rss_link': 'http://www.apple.com/rss '
{'reference_link': 'http://www.apple.com/rss/'
 'rss_link':   'http://ax.itunes.apple.com/WebObjects/MZStore.woa/wpa/MRSS/newreleases/limit=10/rss.xml'}
{'reference_link': 'http://www.apple.com/rss/'
 'rss_link':  'http://ax.itunes.apple.com/WebObjects/MZStore.woa/wpa/MRSS/newreleases/limit=25/rss.xml'}

What I want is this format:

reference_link               rss_link  
http://www.apple.com/     http://www.apple.com/rss/

I found this class scrapy.contrib.exporter.CsvItemExporter(file, include_headers_line=True, join_multivalued=', ', **kwargs) But i don't know how to use this with my code? — blackmamba, Dec 21 '13 at 13:10

score 103 · Answer 1 · answered Dec 21 '13 at 15:26

103

simply crawl with -o csv, like:

scrapy crawl <spider name> -o file.csv -t csv

answered Dec 21 '13 at 15:26

Guy Gavriely

11,228
6
27
42

I am creating a file for each domain. This will create just one file for all the domains. – blackmamba Dec 21 '13 at 15:55
1

By modifying your `pipelines.py` file. You don't need any of the above code and can control the output formatting, order and when to export. I did something similar [HERE](http://stackoverflow.com/questions/20753358/how-can-i-use-the-fields-to-export-attribute-in-baseitemexporter-to-order-my-scr). Defining my own `spider_opened(...)` function. – not2qubit Dec 26 '13 at 12:42
4

In the newer scrapy versions it seems to be: `scrapy runspider -o file.csv -t csv` – Micronax Sep 09 '17 at 11:01
was wondering if suppose in such a case, if we have 2 different parse functions and we want to write 2 separate files for each of them, can it be done? – Aayush Agrawal Mar 10 '18 at 11:03
can you explain what is -t parameter – Hemant Kumar Apr 25 '19 at 04:26
`-t` specifies the [format](https://scrapy.readthedocs.io/en/latest/topics/feed-exports.html) for dumping items. – daaawx Jun 14 '19 at 16:24
can you please explain the differnece between -o and -O? – y.y May 28 '21 at 14:17

score 19 · Answer 2 · edited Feb 14 '19 at 15:37

19

This is what worked for me using Python3:

scrapy runspider spidername.py -o file.csv -t csv

edited Feb 14 '19 at 15:37

ascripter

5,665
12
45
68

answered Feb 14 '19 at 15:18

jwalman

245
2
8

score 2 · Answer 3 · answered Aug 19 '17 at 06:12

Best approach to solve this problem is to use python in-build csv package.

import csv

file_name = open('Output_file.csv', 'w') #Output_file.csv is name of output file

fieldnames = ['reference_link', 'rss_link'] #adding header to file
writer = csv.DictWriter(file_name, fieldnames=fieldnames)
writer.writeheader()
for rss in rsslinks:
    base_url = get_base_url(response)
    writer.writerow({'reference_link': response.url, 'rss_link': urljoin_rfc(base_url, rss)}) #writing data into file.

jonrsharpe · Accepted Answer · 2013-12-21T15:59:58.173

1

You need to

Write your header row; then
Write the entry rows for each object.

You could approach it like:

fields = ["reference_link", "rss_link"] # define fields to use
with open(filename,'a+') as f: # handle the source file
    f.write("{}\n".format('\t'.join(str(field) 
                              for field in fields))) # write header 
    for item in items:
        f.write("{}\n".format('\t'.join(str(item[field]) 
                              for field in fields))) # write items

Note that "{}\n".format(s) gives the same result as "%s\n" % s.

edited Dec 21 '13 at 15:59

answered Dec 21 '13 at 13:11

jonrsharpe

115,751
26
228
437

Be more specific. What have you tried to do to understand it; have you tried testing parts of it in a Python interpreter? – jonrsharpe Dec 21 '13 at 14:36
It actually doesn't work when I tried it. Is this the complete procedure to write it? You haven't closed format and f.write's brackets. – blackmamba Dec 21 '13 at 15:54
I get an error saying: f.write("{}\n".format('\t'.join(str(field) for field in fields))) # write header exceptions.ValueError: zero length field name in format – blackmamba Dec 21 '13 at 16:25
1

Then you are probably using Python 2.6 or earlier. Try `"{0}\n".format(...)`. – jonrsharpe Dec 21 '13 at 16:46
Yes this is working. But it repeatedly adds the headers. So in between links there are headers as well – blackmamba Dec 21 '13 at 18:04
Then move the header write outside whatever loops you have! – jonrsharpe Dec 21 '13 at 18:23
I removed the header write. So it works perfectly. Thank you so much. – blackmamba Dec 21 '13 at 18:43

score 1 · Answer 5 · answered Aug 08 '21 at 06:07

1

custom_settings = {
        'FEED_URI' : 'Quotes.csv'
    }

answered Aug 08 '21 at 06:07

Vinayak

305
2
11

score 0 · Answer 6 · answered Dec 21 '13 at 18:28

0

Try tablib.

dataset = tablib.Dataset()
dataset.headers = ["reference_link", "rss_link"]

def add_item(item):    
   dataset.append([item.get(field) for fields in dataset.headers])

for item in items:
    add_item(item)

f.write(dataset.csv)

answered Dec 21 '13 at 18:28

uhbif19

3,139
3
26
48

Write to a csv file scrapy

6 Answers6

Linked