Convert commas decimal separators to dots within a Dataframe

Question

I am importing a CSV file like the one below, using pandas.read_csv:

df = pd.read_csv(Input, delimiter=";")

Example of CSV file:

10;01.02.2015 16:58;01.02.2015 16:58;-0.59;0.1;-4.39;NotApplicable;0.79;0.2
11;01.02.2015 16:58;01.02.2015 16:58;-0.57;0.2;-2.87;NotApplicable;0.79;0.21

The problem is that when I later on in my code try to use these values I get this error: TypeError: can't multiply sequence by non-int of type 'float'

The error is because the number I'm trying to use is not written with a dot (.) as a decimal separator but a comma(,). After manually changing the commas to a dots my program works.

I can't change the format of my input, and thus have to replace the commas in my DataFrame in order for my code to work, and I want python to do this without the need of doing it manually. Do you have any suggestions?

score 157 · Accepted Answer · answered Jul 29 '15 at 12:44

157

pandas.read_csv has a decimal parameter for this: doc

I.e. try with:

df = pd.read_csv(Input, delimiter=";", decimal=",")

answered Jul 29 '15 at 12:44

stellasia

5,372
4
23
43

score 26 · Answer 2 · answered Feb 06 '17 at 16:39

I think the earlier mentioned answer of including decimal="," in pandas read_csv is the preferred option.

However, I found it is incompatible with the Python parsing engine. e.g. when using skiprow=, read_csv will fall back to this engine and thus you can't use skiprow= and decimal= in the same read_csv statement as far as I know. Also, I haven't been able to actually get the decimal= statement to work (probably due to me though)

The long way round I used to achieving the same result is with list comprehensions, .replace and .astype. The major downside to this method is that it needs to be done one column at a time:

df = pd.DataFrame({'a': ['120,00', '42,00', '18,00', '23,00'], 
                'b': ['51,23', '18,45', '28,90', '133,00']})

df['a'] = [x.replace(',', '.') for x in df['a']]

df['a'] = df['a'].astype(float)

Now, column a will have float type cells. Column b still contains strings.

Note that the .replace used here is not pandas' but rather Python's built-in version. Pandas' version requires the string to be an exact match or a regex.

This workaround is also needed for reading excel files, as `pd.read_excel` doesn't have the `decimal` parameter. — Daddy32, Oct 09 '20 at 08:37
It can be useful the function `pandas.to_numeric` [docs to_numeric](https://pandas.pydata.org/docs/reference/api/pandas.to_numeric.html) where you can control errors on the conversion instead of `astype(float)`. — nachouve, Dec 17 '21 at 08:59
Beware!, with this solution, for numbers containing more than one comma, numbers like `2,340,00` become `2,340.00`. Go with @edhaussy answer for peace of mind. — trazoM, Jan 24 '23 at 09:59

score 16 · Answer 3 · answered May 20 '21 at 07:57

16

stallasia's answer looks like the best one.

However, if you want to change the separator when you already have a dataframe, you could do :

df['a'] = df['a'].str.replace(',', '.').astype(float)

answered May 20 '21 at 07:57

edhaussy

346
2
6

score 8 · Answer 4 · answered Jan 17 '22 at 22:01

8

Thanks for the great answers. I just want to add that in my case just using decimal=',' did not work because I had numbers like 1.450,00 (with thousands separator), therefore pandas did not recognize it, but passing thousands='.' helped to read the file correctly:

df = pd.read_csv(
    Input, 
    delimiter=";", 
    decimal=","
    thousands="."
)

answered Jan 17 '22 at 22:01

Felix

356
4
7

This does not provide an answer to the question. Once you have sufficient [reputation](https://stackoverflow.com/help/whats-reputation) you will be able to [comment on any post](https://stackoverflow.com/help/privileges/comment); instead, [provide answers that don't require clarification from the asker](https://meta.stackexchange.com/questions/214173/why-do-i-need-50-reputation-to-comment-what-can-i-do-instead). - [From Review](/review/late-answers/30841740) – Register Sole Jan 21 '22 at 03:50

score 5 · Answer 5 · answered Oct 02 '17 at 15:00

I answer to the question about how to change the decimal comma to the decimal dot with Python Pandas.

$ cat test.py 
import pandas as pd
df = pd.read_csv("test.csv", quotechar='"', decimal=",")
df.to_csv("test2.csv", sep=',', encoding='utf-8', quotechar='"', decimal='.')

where we specify the reading in decimal separator as comma while the output separator is specified as dot. So

$ cat test.csv 
header,header2
1,"2,1"
3,"4,0"
$ cat test2.csv 
,header,header2
0,1,2.1
1,3,4.0

where you see that the separator has changed to dot.

Convert commas decimal separators to dots within a Dataframe

5 Answers5

Linked

Related