I want to scrape the Article for this site
https://www.traveloffpath.com/covid-19-travel-insurance-everything-you-need-to-know/ and https://www.traveloffpath.com/what-to-do-if-your-flight-is-delayed-or-canceled/?swcfpc=1 I am stuck in the "p" tag because I don't want "p" tags from the start of the article and from the end of the article as I don't want "p" Share the article"p" and "p" last updated "p" and some "p" tag from the bottom text that is not included in the article.
Articletext = soup.find(class_="article")
for items in soup.find_all(class_="article"):
Gather = '\n'.join([item.text for item in items.find_all(["h6","h5","h4","h3","h2","h1","p","li"])])
filtered = Gather.split("↓ Join the community ↓")
Content = filtered[0].split("Email")
while True :
try:
Content = filtered[0].split("Email")
except :
Content = Content[1].split("ago")
else :
break
# try:
# Content = filtered[0].split("Email")
# except:
# Content = filtered[0].split("ago")
# Content = re.split('ago | Read More:',Gather)
print("Content: ", Content[1])
Blockquote
` and later slice this list `[1:-1]` to get without first and without last.
– furas Oct 04 '22 at 13:57