I have the following code that is supposed to scrpae wikiperdia list of headings.. In the output csv, I expect to get the headings (the main headings) and in column B, the subheadings My problem is with the subheadings as I got all the subheadings in one line and I need to put each subheading in one row Here's my try (but I got only the first subheading not all of them)
import scrapy
class WikipediaTocSpider(scrapy.Spider):
name = 'wikipedia_toc'
start_urls = ['https://en.wikipedia.org/wiki/Python_(programming_language)']
def parse(self, response):
for toc in response.css('.toclevel-1'):
yield {
'heading': toc.css('span.toctext::text').get(),
'sub_headings': '\n'.join(toc.css('li.toclevel-2 a span.toctext::text').getall())
}
I run this code from powershell like that
scrapy runspider wikipedia_toc.py -o output.csv
How can I remove the empty lists?