I want to split either comma, semicolon or hyphen (with preceding space) separated words.
The reason for this is the inconsistent structure of a website I am scraping with Scrapy.
So far, I am able to split either comma or semicolon separated words with follwing code:
for i in response.xpath('//meta[@name="keywords"]/@content').extract():
if ',' or ';' in i:
for k in i.split(',') or i.split(';'):
keywords.append([k.strip()])
else:
keywords.append([i.strip()])
That works if the words are separated like:
- keyword1, keyword2, keyword3
- keyword1; keyword2; keyword3
But sometimes the keywords are also stored as follows:
keyword1 - keyword2 - keyword3
I don't know how to split them properly, because the spaces in between the hyphens are giving me headache :). Help is very much appreciated!