Regex dealing with Kanji characters in Python

Question

so for this web-scraping project i'm working on, I've been trying to separate some results from results. basically if the title contains 指定されたページが見つかりません , i'll want to copy the url and write it to one fail.csv file. Anything else i'll want to copy the url and write it to sucess.csv

html = 'www.abc.com'
url = BeautifulSoup(html,'html.parser').title.string
pattern = re.compile(r' 指定されたページが見つかりません')
if pattern.finditer(url):
with open('fail.csv','w') as f:
cw=csv.writer
cw.writerow([url])
else:
move on, run some other codes and write to sucess.csv

However it seems that regex isn't recognising 指定されたページが見つかりません

Am i doing something wrong here or missing something here?

Thanks

oh @lojza so i wrote soup.title.string to get titles from urls — I suck at this, Feb 07 '20 at 22:47
Please follow https://stackoverflow.com/help/minimal-reproducible-example — lojza, Feb 10 '20 at 08:15

score 0 · Answer 1 · answered Feb 17 '20 at 17:26

Try

sudo pip3 install requests
sudo pip3 install beautifulsoup4
sudo pip3 install re

and under python3

import requests
import re
from bs4 import BeautifulSoup

r = requests.get('https://corp.rakuten.co.jp/careers/life/')
r.encoding='utf-8'
pattern = re.compile(r' 指定されたページが見つかりません')
url = BeautifulSoup(r.text,'html.parser').title.string
pattern.findall(url)

Regex dealing with Kanji characters in Python

1 Answers1