0

so for this web-scraping project i'm working on, I've been trying to separate some results from results. basically if the title contains 指定されたページが見つかりません , i'll want to copy the url and write it to one fail.csv file. Anything else i'll want to copy the url and write it to sucess.csv

html = 'www.abc.com'
url = BeautifulSoup(html,'html.parser').title.string
pattern = re.compile(r' 指定されたページが見つかりません')
if pattern.finditer(url):
with open('fail.csv','w') as f:
cw=csv.writer
cw.writerow([url])
else:
move on, run some other codes and write to sucess.csv

However it seems that regex isn't recognising 指定されたページが見つかりません

Am i doing something wrong here or missing something here?

Thanks

1 Answers1

0

Try

sudo pip3 install requests
sudo pip3 install beautifulsoup4
sudo pip3 install re

and under python3

import requests
import re
from bs4 import BeautifulSoup

r = requests.get('https://corp.rakuten.co.jp/careers/life/')
r.encoding='utf-8'
pattern = re.compile(r' 指定されたページが見つかりません')
url = BeautifulSoup(r.text,'html.parser').title.string
pattern.findall(url)
lojza
  • 1,823
  • 2
  • 13
  • 23