Is there a faster alternative to Python String.replace()

Asked Jun 10 '20 at 10:33

Active Jun 11 '20 at 01:15

Viewed 695 times

I am writing a piece of code that requests a web page, parse it, and extracts certain information in it.

Everything works perfectly except for this part (Edited)

start_time = time.time()
r = requests.get(item_url)
print(time.time() - start_time)
formatted_html = r.text.replace('=\r\n', '')
print(time.time() - start_time)
formatted_html = re.sub('=\r\n', '', r.text)
print(time.time() - start_time)

Output

0.9731616973876953
1.9460444450378418
3.0275654792785645

The text.replace takes 1 seconds+ to complete and is used to fix a "quoted-printable" html string. There are also many web pages to cover (separated in threads), so I'm trying to speed up this "fix". I couldn't find a way to request the web page without being quoted-printable as well.

Any ideas?

Edit: It takes a lot of time because the string is huge (100k+ length)

Edit: I have tried re.sub, String.split() then join (None are significantly faster)

edited Jun 11 '20 at 01:15

asked Jun 10 '20 at 10:33

Harrison Seow

4

Are you sure that `.replace()` takes 1+ sec? You can type `print(r.elapsed)` to see how much time the actual request took. – Ivan Vinogradov Jun 10 '20 at 10:36
You can use multithreading if the threads are not dependent on each other.. – Rushabh Sudame Jun 10 '20 at 10:38
1

@HarrisonSeow Since you are parsing webpages that must involve HUGE amount of text, and the best solution for that is regex. Perhaps you might want to look in this question -https://stackoverflow.com/questions/4893506/fastest-python-method-for-search-and-replace-on-a-large-string – Prince Jun 10 '20 at 10:39
1

@IvanVinogradov I've added the way I measured the time for the function (with only 1 thread) – Harrison Seow Jun 10 '20 at 10:40
@RushabhSudame I thought threading is multithreading already, is there any difference? `t = threading.Thread(target=processMHT, kwargs=item)` is what I'm using – Harrison Seow Jun 10 '20 at 10:51
@Prince I've seen them and tried but they're basically the same, and I believe String.replace() is faster? – Harrison Seow Jun 10 '20 at 10:52
@HarrisonSeow you edited the question after comment.. – Rushabh Sudame Jun 10 '20 at 10:54
see https://docs.python.org/3/library/re.html#re.sub – balderman Jun 10 '20 at 10:54
@RushabhSudame It was always there (about threading), you can check the edit history. – Harrison Seow Jun 11 '20 at 01:07
@balderman Care to explain how to use it so that it's faster? I wasn't able to do it. – Harrison Seow Jun 11 '20 at 01:07
See https://lzone.de/examples/Python%20re.sub for examples – balderman Jun 11 '20 at 05:04
@balderman I've tried it the same as the example and there's no significant reduce – Harrison Seow Jun 11 '20 at 10:33

Is there a faster alternative to Python String.replace()

0 Answers0