Why does .replace not work in my dataframe?

Question

I have code roughly like this:

df[['floatcol1', 'floatcol2', 'floatcol3']] = df[['floatcol1', 'floatcol2', 'floatcol3']].astype(str)

df[['strfloatcol1', 'strfloatcol2', 'strfloatcol3']] = df[['strfloatcol1', 'strfloatcol2', 'strfloatcol3']].replace(',', '.')

But it is still printing my values like 527,1 and 847,9 instead of 527.2 and 847.9 like I want. I'm confused why replace isn't replacing the commas.

@HenryEcker No, that isn't the problem. As asked, we want to replace literal commas. — Karl Knechtel, Dec 14 '21 at 03:51
I don't understand @KarlKnechtel Without regex=True `replace` will only swap exact string matches. _i.e._ cells that contain only a single comma. It appears that we want to replace commas with dot within strings. — Henry Ecker, Dec 14 '21 at 03:52
OP: Do you *expect* the `.astype(str)` result to contain commas? Or are you trying to fix a bad localization setting? Or just what? — Karl Knechtel, Dec 14 '21 at 03:52
@HenryEcker what is there not to understand? Where it says `.replace(',', '.')` in the original code, there is no desire to use regular expressions, and therefore no reason to write `regex=True`. — Karl Knechtel, Dec 14 '21 at 03:53
That makes no sense @KarlKnechtel Without `regex=True` it will only replace exact string matches. Cells that contain _only_ a single comma would be replaced. With `regex=True` it _will_ replace commas contained within cells with other text. This is the standard way to solve this problem at the DataFrame level. — Henry Ecker, Dec 14 '21 at 03:54
I'm trying to replace a comma within a larger string. The strings I'm given are using commas as a decimal point, but I need to use a normal period in order to change their dtype and operate on them. So like in the example, I want 572,1 --> 572.1. — winterdiablo, Dec 14 '21 at 04:03
It seems I am incorrect. Very strange behaviour; I expect the `regex` flag to control whether regexes are used, but not *also* to be the way to specify "look for matches within each string rather than trying to match the entire string". — Karl Knechtel, Dec 14 '21 at 04:03
@Grantholomeu: I don't think you understand. You got your strings by converting floats to string, correct? You expect strings that represent a floating point number to use the `.` symbol to represent a decimal point, rather than the `,` symbol, correct? But the ones you *actually got*, use `,` instead, correct? What I am saying is that *you should fix how things are set up so that your Pandas installation understands that `.` should be used as a decimal point*. — Karl Knechtel, Dec 14 '21 at 04:05
Are you reading in from a CSV @Grantholomeu or are you generating these strings some other way? — Henry Ecker, Dec 14 '21 at 04:08
Yes, they're read in from a CSV. I just checked the dtype and found that df['col'] is coming back as a dataframe object. I'm confused how the column can be a dataframe. — winterdiablo, Dec 14 '21 at 04:22
are you sure you checked `df['col']` instead of `df[['col']]`? the former will return a series, the latter a dataframe — tdy, Dec 14 '21 at 04:29

score 0 · Answer 1 · answered Dec 14 '21 at 03:49

0

Try with

df[['strfloatcol1', 'strfloatcol2', 'strfloatcol3']].replace({',':'.'}, regex=True, inplace=True)

answered Dec 14 '21 at 03:49

BENY

317,841
20
164
234

score 0 · Answer 2 · edited Dec 14 '21 at 06:54

0

I got it. You have to use:

df['col'] = df['col'].str.replace(',', '.', regex=True).astype(float)

for each column. If you pass in a list of columns, Python can't convert a list into a string. There may be an easier, more effective way, but I only have three columns to convert.

edited Dec 14 '21 at 06:54

holydragon

6,158
6
39
62

answered Dec 14 '21 at 04:35

winterdiablo

25
7

Why does .replace not work in my dataframe?

2 Answers2