This is my Scrapy custom regex pipeline code:
for p in item['code']:
for search_type, pattern in RegEx.regexp.iteritems():
s = re.findall(pattern, p)
if s:
return item
else:
raise DropItem
And this is my ReGex code:
class RegEx(object):
regexp = {
'email' : re.compile('liczba'), 'whatever' : re.compile(r'mit'), 'blu' : re.compile(r'houseLocked'),}
Not real compiled regex as just for demo purposes.
This works, but once a match is found, and "return item" is triggered, the rest is dropped.
Is is possible to continue iterating in the Scrapy pipeline?
I've been at this for 4 days and tried every permutation you can imagine, but always the same result.
I'm either missing the obvious or this is not straightforward.
If not possible in this manner, any recommendations for a new route greatly appreciated.