I am writing a piece of code that requests a web page, parse it, and extracts certain information in it.
Everything works perfectly except for this part (Edited)
start_time = time.time()
r = requests.get(item_url)
print(time.time() - start_time)
formatted_html = r.text.replace('=\r\n', '')
print(time.time() - start_time)
formatted_html = re.sub('=\r\n', '', r.text)
print(time.time() - start_time)
Output
0.9731616973876953
1.9460444450378418
3.0275654792785645
The text.replace takes 1 seconds+ to complete and is used to fix a "quoted-printable" html string. There are also many web pages to cover (separated in threads), so I'm trying to speed up this "fix". I couldn't find a way to request the web page without being quoted-printable as well.
Any ideas?
Edit: It takes a lot of time because the string is huge (100k+ length)
Edit: I have tried re.sub, String.split() then join (None are significantly faster)