13

I am confused by the type conversion in python pandas

df = pd.DataFrame({'a':['1.23', '0.123']})
type(df['a'])
df['a'].astype(float)

Here df is a pandas series and its contents are 2 strings, then I can apply astype(float) on this pandas series, and it correctly convert all string into float. However

df['a'][1].astype(float)

gives me AttributeError: 'str' object has no attribute 'astype'. My question is: how can that be? I could convert the whole series from string to float but I couldn't convert the entry of this series from string to float?

Also, I load my raw data set

df['id'].astype(int)

it generates ValueError: invalid literal for int() with base 10: '' This one seems to suggest that there is a blank in my df['id']. So I check whether it is true by typing

'' in df['id']

it says false. So I am very confused.

KevinKim
  • 1,382
  • 3
  • 18
  • 34

4 Answers4

14

df['a'] returns a Series object that has astype as a vectorized way to convert all elements in the series into another one.

df['a'][1] returns the content of one cell of the dataframe, in this case the string '0.123'. This is now returning a str object that doesn't have this function. To convert it use regular python instruction:

type(df['a'][1])
Out[25]: str

float(df['a'][1])
Out[26]: 0.123

type(float(df['a'][1]))
Out[27]: float

As per your second question, the operator in that is at the end calling __contains__ against the series with '' as argument, here is the docstring of the operator:

help(pd.Series.__contains__)
Help on function __contains__ in module pandas.core.generic:

__contains__(self, key)
    True if the key is in the info axis

It means that the in operator is searching your empty string in the index, not the contents of it.

The way to search your empty strings is to use the equal operator:

df
Out[54]: 
    a
0  42
1    

'' in df
Out[55]: False

df==''
Out[56]: 
       a
0  False
1   True

df[df['a']=='']
Out[57]: 
  a
1  
Zeugma
  • 31,231
  • 9
  • 69
  • 81
  • thanks! I have a short followup question. So in your example `df`, if I want to check whether number 42 is in the df, I should not use `42 in df` or `42 in df['a']` or `42 in df[['a']]` right? the `in` is checking the index of a pandas series? but what about `df[['a']]`? it is a pandas dataframe. So `in` when operating on a dataframe is still checking the index? – KevinKim Jan 29 '17 at 05:19
  • Same mechanics for a dataframe. So do df==42 – Zeugma Jan 29 '17 at 05:45
2

df['a'][1] will return the actual value inside the array, at the position 1, which is in fact a string. You can convert it by using float(df['a'][1]).

>>> df = pd.DataFrame({'a':['1.23', '0.123']})
>>> type(df['a'])
<class 'pandas.core.series.Series'>
>>> df['a'].astype(float)
0    1.230
1    0.123
Name: a, dtype: float64
>>> type(df['a'][1])
<type 'str'>

For the second question, maybe you have an empty value on your raw data. The correct test would be:

>>> df = pd.DataFrame({'a':['1', '']})
>>> '' in df['a'].values
True

Source for the second question: https://stackoverflow.com/a/21320011/5335508

Community
  • 1
  • 1
0
data1 = {'age': [1,1,2, np.nan],
        'gender': ['m', 'f', 'm', np.nan],
        'salary': [2,1,2, np.nan]}

x = pd.DataFrame(data1)
for i in list(x.columns):
    print(type((x[i].iloc[1])))
    if isinstance(x[i].iloc[1], str):
        print("It is String")
    else:
        print('Not a String')
kamran kausar
  • 4,117
  • 1
  • 23
  • 17
  • 2
    Posting code without any explanation isn't welcome here. Please edit your post. – Cid Jun 22 '18 at 09:40
0

In addition to the solutions already posted you could also simply use:

df['a'].astype(float)[1]
  • 1
    Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Aug 04 '23 at 04:47