I'm using scrapely to extract data from some HTML, but I'm having difficulties extracting a list of items.
The scrapely github project describes only a simple example:
from scrapely import Scraper
s = Scraper()
s.train(url, data)
s.scrape(another_url)
This is nice if, for example, you are trying to extract data as described:
Usage (API)
Scrapely has a powerful API, including a template format that can be edited externally, that you can use to build very capable scrapers.
What follows that section is a quick example of the simplest possible usage, that you can run in a Python shell.
However, I'm not sure how to extract data if you found something like
Ingredientes
- 50 gr de hojas de albahaca
- 4 cucharadas (60 ml) de piñones
- 2 - 4 dientes de ajo
- 120 ml (1/2 vaso) de aceite de oliva virgen extra
- 115 gr de queso parmesano recién rallado
- 25 gr de queso pecorino recién rallado ( o queso de leche de oveja curado)
I know I can't extract this by using xpath or css selector, but I'm more interested in using parsers that can extract data for similar pages.