3

I am new to scrapy. In items.py, I declare 2 ItemClass called ItemClass1 and ItemClass2. A spider method parseUrl get the html and scrape data and put into lists for respective Item Classes.

e.g:
C1Items = []
C1Item = ItemClass1()
#scrape data
C1Items.append(C1Item)
...
C2Items = []
C2Item = ItemClass2()
#scrape data
C2Items.append(C2Item)
...

finally: C1Items and C2Items contain required data.

return C1Items #will pass ItemClass1 data to pipeline
return C2Items #will pass ItemClass2 data to pipeline

Could you please advise what is the best way to pass both C1Items, C2Items to pipeline.

Harry
  • 570
  • 2
  • 10
  • 19

2 Answers2

5

Either combine all the items of different classes into one list and return that list, or use yield statement:

C1Item = ItemClass1()
#scrape data
yield C1Item
...
C2Item = ItemClass2()
#scrape data
yield C2Item
warvariuc
  • 57,116
  • 41
  • 173
  • 227
2

Just combine the arrays into one big array and return that:

return C1Items + C2Items

or alternatively you could turn parseUrl into a generator function with:

yield C1Items
yield C2Items
Steven Almeroth
  • 7,758
  • 2
  • 50
  • 57
  • Hi, I tried yield but encountered error:ERROR: Spider must return Request, BaseItem or None, got 'list' – Harry Dec 29 '12 at 06:16