I have a CSV file with several columns that include integers and a string. Naturally, I get a dtype warning because of the mixed dtypes. I read the file with this general command.
df = pd.read_csv(path, sep=";", na_values=missing)
I could use low_memory=False
or dtype=object
to silence the warning but as far as I know this makes reading my file not more memory efficient.
I could also use na_values="my_string"
but I have other missing values (which are supposed to be real missing values) and do not want to mix them.
I do not need the value of the string but only its value count so I thought of replacing it with an integer. Something like this.
df.replace(to_replace="my_string", value=999)
However, is it also possible to replace a value while reading a CSV file? Or does another solution exist? I do not want to simply silence the warning but find a solution which is more memory efficient.
(I know about this answer but it does not really help me with my problem.)