Is there a faster way to clean emojis from a cuDF string series?
I am currently using
emoji == 1.7.0
and retrieving the regex emoji patterns (since cuDF doesnt support the emoji library directly to do a emoji.get_emoji_regexp().sub("", string)
apply on a series)
emoji_pattern = emoji.get_emoji_regexp()
then applying it over the cudf series
cu_df.post_content.str.replace(
emoji_patterns,
"",
regex = True
)
This operation however, is extremely slow, slower than if I had use multiprocessing on CPUs (~45mins vs ~1hour+). Anyone knows why, and what I am doing wrong?