3

I am trying to write a function which is equal to isnumber[column] function in excel

dataset:

feature1 feature2 feature3
  123       1.07     1
  231       2.08     3
  122        ab      4
  111       3.04     6
  555        cde     8

feature1: integer dtype
feature2: object dtype
feature3: integer dtype

I tried this piece of code

for item in df.feature2.iteritems():
    if isinstance(item, float):
       print('yes')
    else:
       print('no')

I got the result as

 no
 no
 no
 no
 no

But i want the result as

yes
yes
no
yes
no

When i tried to check the type of individual feature2 values, this is what see

type(df.feature2[0]) = str
type(df.feature2[1]) = str
type(df.feature2[2]) = str
type(df.feature2[3]) = str
type(df.feature2[4]) = str

But clearly 0,1,3 should be shown as float, but they show up as str

What am i doing wrong?

Sai Sumanth
  • 161
  • 3
  • 9

5 Answers5

1

Iteritems is returning a tuple, ((123, '1.07'), 1.07) and since you want to loop over each value try the below code. You just need to remove .iteritems() and it will work like a charm.

df['feature2']=[1.07,2.08,'ab',3.04,'cde']
for item in df.feature2:
    if isinstance(item,float):
       print('yes')
    else:
       print('no')

Here is your output:

yes
yes
no
yes
no
Ankur Gulati
  • 291
  • 1
  • 12
1

I think there are two things you need to consider here:

  1. Methods for Dict vs DataFrame
  2. Difference between dtype (array-scalar types) and type (built-in Python types) - Reference (https://numpy.org/devdocs/reference/arrays.dtypes.html)

Point 1:

.iteritems() / .items() are methods for dictionaries, whereas if you're dealing with dtypes (and judging by the data you've provided), you're likely to be going through a DataFrame, in which you don't need to use the .iteritems() method to loop through each value. Side note, .iteritems() has been phased out by Python and is replaced by .items() (See discussion: When should iteritems() be used instead of items()?)

Point 2:

When using numpy or Pandas, the data type of values imported into the DataFrames are called dtypes. These need to be differentiated from their direct comparisons in Pythons, which Python refers to as just type. You should use the table under "Pandas Data Types" heading for mapping of dtype to type (Ref: https://pbpython.com/pandas_dtypes.html)

Now, in response to your question, this bit of code should solve your issue:

import pandas as pd

columns = ['feature1', 'feature2', 'feature3']
data = [[123, 1.07, 1],
        [231, 2.08, 3],
        [122, 'ab', 4],
        [111, 3.04, 6],
        [555, 'cde', 8]]

df = pd.DataFrame(data, columns=columns)

for value in df.feature2:
    if isinstance(value,float):
        print('yes')
    else:
        print('no')
MTay
  • 139
  • 11
0

Try this:

for i in range(len(df["feature2"])):
    test = df.loc[i,"feature2"]
    if isinstance(test, float):
        print('yes')
    else:
        print('no')
  • bear in mind that this just tests for floats - if you want any number, float or integer, you'd have to change the third line to if isinstance(test, float) or isinstance(test, int): – Ellie Hanna Nov 21 '18 at 18:39
0

This is because iteritems() returns a tuple which is the (index, value). So you are trying to check for example if (0, 1.07) or (1, 2.08) are of type float, which they aren't of course.

It should work if you change df.feature2.iteritems() to df.feature2.values :)

hmajid2301
  • 149
  • 3
  • 11
0

You can do something like this:

from pandas import DataFrame as df

columns = ['feature1', 'feature2', 'feature3']
data = [[123, 1.07, 1],
 [231, 2.08, 3],
 [122, 'ab', 4],
 [111, 3.04, 6],
 [555, 'cde', 8]]

df_ = df(data, columns=columns)
types = []
for k in df_:
    a = set(type(m) for m in df_[k])
    if len(a) > 1:
        types.append({k: 'object'})
    else:
        types.append({k: str(list(a)[0].__name__)})

print(types)

Output:

[{'feature1': 'int'}, {'feature2': 'object'}, {'feature3': 'int'}]
Chiheb Nexus
  • 9,104
  • 4
  • 30
  • 43