4

I have a Pandas Dataframe and there is a column which is of float type. But numbers with decimals don't make sense for this column. So I want to find out how many floats are in this column and after that I want to delete the whole row where I have a float in this column. An alternative could be to count the number of integers and subtract this from the number of rows overall.

example Dataset

What I have:

A B C
0.5 0.1 2.0
0.8 0.9 3.5
0.6 0.2 1.0

What I need:

First count floats or integers:

C 1 (as there is only one float in column "C" or) alternatively: C 2 (as there are two integers in column "C")

Second delete rows with floats:

A B C
0.5 0.1 2.0
0.6 0.2 1.0

I tried to handle this problem the same way I handled my Missing Values, but this did not work.

# Count Integers
print(Data.is_integer().sum())
# Delete rows where "C" is not an integer
Data=Data.drop(Data[Data.C.is_integer()=0].index)

Both Did not work.I am using Python in Colab btw

akfin
  • 43
  • 1
  • 4

4 Answers4

3

You could use the modulus operator to remove any rows where column C has a number with a decimal part.

You might also want to convert column 'C' to integer.

import pandas as pd 

df = pd.DataFrame({'A':[0.5,0.8,0.6], 'B':[0.1, 0.9, 0.2], 'C':[2.0, 3.5, 1.0]})

df = df[df['C'] % 1 == 0]

df['C'] = df['C'].astype(int)
norie
  • 9,609
  • 2
  • 11
  • 18
1

Integers and floats are two different kinds of numerical data. An integer (more commonly called an int) is a number without a decimal point. A float is a floating-point number, which means it is a number that has a decimal place. Floats are used when more precision is needed. So this case you use floats for the whole you don't need an inteager

Track_suit
  • 85
  • 14
  • the column C in my example is something like "number of Cars". Something 3.5 does not make sense for this column as you can't have a half car. So this Datapoints are mistakes and so useless for me. I want to identify those mistakes and delete the whole row. The values in my column are actually all floats, but the the decimals should 0 for all of them. So i tried the is_integer mehthod. Becaus this method gives out True for 2.0 and False for 3.5. – akfin Mar 06 '21 at 15:53
1

based on my understanding you can try with mod and filter in rows whose mod is 0:

df[df['C'].mod(1).eq(0)]

     A    B    C
0  0.5  0.1  2.0
2  0.6  0.2  1.0
anky
  • 74,114
  • 11
  • 41
  • 70
0

You could write a function where you pass the condition of .is_integer() == True that would return the number, in else condition you can output it as numpy.nan, later using .dropna function to drop all nan units.

Akmal Soliev
  • 572
  • 4
  • 17