Function to check if object-dtype column value is float or string

Question

I am trying to write a function which is equal to isnumber[column] function in excel

dataset:

feature1 feature2 feature3
  123       1.07     1
  231       2.08     3
  122        ab      4
  111       3.04     6
  555        cde     8

feature1: integer dtype
feature2: object dtype
feature3: integer dtype

I tried this piece of code

for item in df.feature2.iteritems():
    if isinstance(item, float):
       print('yes')
    else:
       print('no')

I got the result as

 no
 no
 no
 no
 no

But i want the result as

yes
yes
no
yes
no

When i tried to check the type of individual feature2 values, this is what see

type(df.feature2[0]) = str
type(df.feature2[1]) = str
type(df.feature2[2]) = str
type(df.feature2[3]) = str
type(df.feature2[4]) = str

But clearly 0,1,3 should be shown as float, but they show up as str

What am i doing wrong?

Ankur Gulati · Answer 1 · 2018-11-21T19:01:54.293

1

Iteritems is returning a tuple, ((123, '1.07'), 1.07) and since you want to loop over each value try the below code. You just need to remove .iteritems() and it will work like a charm.

df['feature2']=[1.07,2.08,'ab',3.04,'cde']
for item in df.feature2:
    if isinstance(item,float):
       print('yes')
    else:
       print('no')

Here is your output:

yes
yes
no
yes
no

edited Nov 21 '18 at 19:01

answered Nov 21 '18 at 18:42

Ankur Gulati

291
1
12

If it helps, please care to accept and upvote the answer. Thanks :) – Ankur Gulati Nov 21 '18 at 18:43
@SaiSumanth Can you tell me what's the error? I included the data frame creation line that I used for testing and it is working from me. Also, I am using Python3 – Ankur Gulati Nov 21 '18 at 19:03
Its working now, actually my features values were in 'float string' type instead of float type. Thanks – Sai Sumanth Nov 21 '18 at 19:20

score 1 · Answer 2 · answered Jul 14 '20 at 06:42

I think there are two things you need to consider here:

Methods for Dict vs DataFrame
Difference between dtype (array-scalar types) and type (built-in Python types) - Reference (https://numpy.org/devdocs/reference/arrays.dtypes.html)

Point 1:

.iteritems() / .items() are methods for dictionaries, whereas if you're dealing with dtypes (and judging by the data you've provided), you're likely to be going through a DataFrame, in which you don't need to use the .iteritems() method to loop through each value. Side note, .iteritems() has been phased out by Python and is replaced by .items() (See discussion: When should iteritems() be used instead of items()?)

Point 2:

When using numpy or Pandas, the data type of values imported into the DataFrames are called dtypes. These need to be differentiated from their direct comparisons in Pythons, which Python refers to as just type. You should use the table under "Pandas Data Types" heading for mapping of dtype to type (Ref: https://pbpython.com/pandas_dtypes.html)

Now, in response to your question, this bit of code should solve your issue:

import pandas as pd

columns = ['feature1', 'feature2', 'feature3']
data = [[123, 1.07, 1],
        [231, 2.08, 3],
        [122, 'ab', 4],
        [111, 3.04, 6],
        [555, 'cde', 8]]

df = pd.DataFrame(data, columns=columns)

for value in df.feature2:
    if isinstance(value,float):
        print('yes')
    else:
        print('no')

score 0 · Answer 3 · answered Nov 21 '18 at 18:37

0

Try this:

for i in range(len(df["feature2"])):
    test = df.loc[i,"feature2"]
    if isinstance(test, float):
        print('yes')
    else:
        print('no')

answered Nov 21 '18 at 18:37

Ellie Hanna

58
6

bear in mind that this just tests for floats - if you want any number, float or integer, you'd have to change the third line to if isinstance(test, float) or isinstance(test, int): – Ellie Hanna Nov 21 '18 at 18:39

score 0 · Answer 4 · answered Nov 21 '18 at 18:38

This is because iteritems() returns a tuple which is the (index, value). So you are trying to check for example if (0, 1.07) or (1, 2.08) are of type float, which they aren't of course.

It should work if you change df.feature2.iteritems() to df.feature2.values :)

score 0 · Answer 5 · answered Nov 21 '18 at 19:13

You can do something like this:

from pandas import DataFrame as df

columns = ['feature1', 'feature2', 'feature3']
data = [[123, 1.07, 1],
 [231, 2.08, 3],
 [122, 'ab', 4],
 [111, 3.04, 6],
 [555, 'cde', 8]]

df_ = df(data, columns=columns)
types = []
for k in df_:
    a = set(type(m) for m in df_[k])
    if len(a) > 1:
        types.append({k: 'object'})
    else:
        types.append({k: str(list(a)[0].__name__)})

print(types)

Output:

[{'feature1': 'int'}, {'feature2': 'object'}, {'feature3': 'int'}]

Function to check if object-dtype column value is float or string

5 Answers5