String methods fail with Modin, but same work with Pandas

Question

I'm currently trying to improve processing speed on several large log files, to extract some metrics to then store on a Postgres database. Currently, I'm just trying the first step, which is, simply filtering only relevant lines of the log after having them processed.

This is the sample code that currently works in regular Pandas:

import os
import regex as re
import pandas as pd

fp = "server.log"
data_lines = []

with open(fp, "rt", encoding="utf8") as file:
    lines = file.readlines()
    # data_lines += [
    #     line for line in lines
    #     if "POST" in line
    # ]
    data_lines += lines

# Processing
df = pd.DataFrame({"src": data_lines})
df.src = df.src.astype("string")

df = df[df.src.str.contains("POST")]

But, when I try to replace import pandas as pd with import modin.pandas as pd, I get this error:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xee in position 67: invalid continuation byte

As shown, the text file is being open with the correct encoding, and no error is thrown when using the same code with Pandas. Please, advise in case this is not the intended way to use Modin.

WHERE do you get that error? Which line? – Tim Roberts Aug 17 '22 at 22:27 — Tim Roberts, Aug 17 '22 at 22:27
On the last one, with the ```contains``` method – Cristian Acuña Aug 17 '22 at 22:34 — Cristian Acuña, Aug 17 '22 at 22:34
You've stumped me. You may need to ask the `modin` folks. – Tim Roberts Aug 17 '22 at 23:51 — Tim Roberts, Aug 17 '22 at 23:51

String methods fail with Modin, but same work with Pandas

0 Answers0