23

I am using scrapy 0.20.

I want to use item loader

this is my code:

l = XPathItemLoader(item=MyItemClass(), response=response)
        l.add_value('url', response.url)
        l.add_xpath('title',"my xpath")
        l.add_xpath('developer', "my xpath")
return l.load_item()

I got the result in the json file. the url is a list. The title is a list. The developer is a list.

How to extract single value instead of the list?

Should I make an item pipeline for that? I hope there is a faster way

alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
Marco Dinatsoli
  • 10,322
  • 37
  • 139
  • 253

1 Answers1

44

You need to set an Input or Output processor. TakeFirst would work perfectly in your case.

There are multiple places where you can define it, e.g. in the Item definition:

from scrapy.item import Item, Field
from scrapy.loader.processors import TakeFirst

class MyItem(Item):
    url = Field(output_processor=TakeFirst())
    title = Field(output_processor=TakeFirst())
    developer = Field(output_processor=TakeFirst())

Or, set a default_output_processor on a XpathItemLoader() instance:

l.default_output_processor = TakeFirst()
vhs
  • 9,316
  • 3
  • 66
  • 70
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
  • Excellent. +1 I will accept once the system allows. But please what is the difference between the two ways you have provided? – Marco Dinatsoli May 27 '14 at 16:21
  • Also, is there a similar way to set the output in case the list was empty? because now I am getting `null` as a value of empty attribute. For example, some pages doesn't have the `title` attribute, not I get `nul` but before I just was getting `""` – Marco Dinatsoli May 27 '14 at 16:23
  • @MarcoDinatsoli well, speaking about the difference, [`Declaring Input and Output Processors`](http://doc.scrapy.org/en/latest/topics/loaders.html#topics-loaders-processors-declaring) explains the priority of input and output processors. `Item` class fields can be reused by multiple loaders, and which loader can have it's own way of presenting the crawled data. I'd define the processor on the loader instead of on the item fields in your case. – alecxe May 27 '14 at 16:26
  • @MarcoDinatsoli give a try to [`Join`](http://doc.scrapy.org/en/latest/topics/loaders.html#scrapy.contrib.loader.processor.Join) instead of `TakeFirst`, but make sure there is only single value in a list. – alecxe May 27 '14 at 16:27
  • could you check my question here please http://stackoverflow.com/questions/24109713/scrapy-spider-sends-spider-close-signal-before-it-closes – Marco Dinatsoli Jun 08 '14 at 19:17
  • Note: you might want to set `output_processor=Identity()` to exclude fields of having the `default_output_processor` set to `TakeFirst()`. – TechWisdom Jun 09 '20 at 12:37