0

I have code roughly like this:

df[['floatcol1', 'floatcol2', 'floatcol3']] = df[['floatcol1', 'floatcol2', 'floatcol3']].astype(str)

df[['strfloatcol1', 'strfloatcol2', 'strfloatcol3']] = df[['strfloatcol1', 'strfloatcol2', 'strfloatcol3']].replace(',', '.')

But it is still printing my values like 527,1 and 847,9 instead of 527.2 and 847.9 like I want. I'm confused why replace isn't replacing the commas.

Henry Ecker
  • 34,399
  • 18
  • 41
  • 57
  • Missing `regex=True` -> `.replace(',', '.', regex=True)` – Henry Ecker Dec 14 '21 at 03:48
  • @HenryEcker No, that isn't the problem. As asked, we want to replace literal commas. – Karl Knechtel Dec 14 '21 at 03:51
  • I don't understand @KarlKnechtel Without regex=True `replace` will only swap exact string matches. _i.e._ cells that contain only a single comma. It appears that we want to replace commas with dot within strings. – Henry Ecker Dec 14 '21 at 03:52
  • OP: Do you *expect* the `.astype(str)` result to contain commas? Or are you trying to fix a bad localization setting? Or just what? – Karl Knechtel Dec 14 '21 at 03:52
  • @HenryEcker what is there not to understand? Where it says `.replace(',', '.')` in the original code, there is no desire to use regular expressions, and therefore no reason to write `regex=True`. – Karl Knechtel Dec 14 '21 at 03:53
  • 1
    That makes no sense @KarlKnechtel Without `regex=True` it will only replace exact string matches. Cells that contain _only_ a single comma would be replaced. With `regex=True` it _will_ replace commas contained within cells with other text. This is the standard way to solve this problem at the DataFrame level. – Henry Ecker Dec 14 '21 at 03:54
  • I'm trying to replace a comma within a larger string. The strings I'm given are using commas as a decimal point, but I need to use a normal period in order to change their dtype and operate on them. So like in the example, I want 572,1 --> 572.1. – winterdiablo Dec 14 '21 at 04:03
  • It seems I am incorrect. Very strange behaviour; I expect the `regex` flag to control whether regexes are used, but not *also* to be the way to specify "look for matches within each string rather than trying to match the entire string". – Karl Knechtel Dec 14 '21 at 04:03
  • @Grantholomeu: I don't think you understand. You got your strings by converting floats to string, correct? You expect strings that represent a floating point number to use the `.` symbol to represent a decimal point, rather than the `,` symbol, correct? But the ones you *actually got*, use `,` instead, correct? What I am saying is that *you should fix how things are set up so that your Pandas installation understands that `.` should be used as a decimal point*. – Karl Knechtel Dec 14 '21 at 04:05
  • Are you reading in from a CSV @Grantholomeu or are you generating these strings some other way? – Henry Ecker Dec 14 '21 at 04:08
  • Yes, they're read in from a CSV. I just checked the dtype and found that df['col'] is coming back as a dataframe object. I'm confused how the column can be a dataframe. – winterdiablo Dec 14 '21 at 04:22
  • are you sure you checked `df['col']` instead of `df[['col']]`? the former will return a series, the latter a dataframe – tdy Dec 14 '21 at 04:29
  • tdy, I think that was part of the problem. – winterdiablo Dec 14 '21 at 04:42

2 Answers2

0

Try with

df[['strfloatcol1', 'strfloatcol2', 'strfloatcol3']].replace({',':'.'}, regex=True, inplace=True)
BENY
  • 317,841
  • 20
  • 164
  • 234
0

I got it. You have to use:

df['col'] = df['col'].str.replace(',', '.', regex=True).astype(float)

for each column. If you pass in a list of columns, Python can't convert a list into a string. There may be an easier, more effective way, but I only have three columns to convert.

holydragon
  • 6,158
  • 6
  • 39
  • 62