0

there is import from CSV file a big chunk of data, Pandas assigned automatically dtype to colums(138), there is one column I have to compare for extract result.

I tried with astype() and apply() method for change column without success.

I tried with this below:

In [78]: df['PDP_ADDIPV4_01']=df['PDP_ADDIPV4_01'].astype(str,errors='ignore')
         df['PDP_ADDIPV4_01'].dtype
Out[78]: dtype('O')

In [79]: df['PDP_ADDIPV4_01']=df['PDP_ADDIPV4_01'].astype('str',errors='ignore')
         df['PDP_ADDIPV4_01'].dtype
Out[79]: dtype('O')


In [49]: df['PDP_ADDIPV4_01'].dtype
Out[49]: dtype('O')

In [50]: df['PDP_ADDIPV4_01']=df['PDP_ADDIPV4_01'].astype(int,errors='ignore')
         df['PDP_ADDIPV4_01'].dtype
Out[50]: dtype('O')

In [51]: 
     df['PDP_ADDIPV4_01']=df['PDP_ADDIPV4_01'].astype('int',errors='ignore')
     df['PDP_ADDIPV4_01'].dtype
Out[51]: dtype('O')

It does not display any error but also it does not change dtype in column. it does not change for integer or string.

dannisis
  • 423
  • 7
  • 17

1 Answers1

1

If use pandas 0.24+ is possible convert non numeric to missing values by to_numeric and then to integers by Nullable Integer Data Type if need convert columns to integers:

df['PDP_ADDIPV4_01']= pd.to_numeric(df['PDP_ADDIPV4_01'],errors='coerce').astype('Int64')

But if want strings, if dtype is object, then it means obviously strings. So no converting is necessary.

jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • No I think you should use `to_numeric` and then `astype('Int64')` and obviously specify this is for 0.24+ only. – cs95 Jun 26 '19 at 05:26
  • I tried to change dtype for string or integer, for comparison I need to convert to string. – dannisis Jun 26 '19 at 05:41
  • @user3118330 - Then convert to integer is not necessary, because `O` is obviously string, check [this](https://stackoverflow.com/a/37562101) and [this](https://stackoverflow.com/questions/42672552/pandas-cast-column-to-string-does-not-work/42672574#42672574) – jezrael Jun 26 '19 at 05:47
  • the result is not what I expect: `` df['PDP_ADDIPV4_01'].head() 0 00000000 1 00000000 2 00000000 3 00000000 4 00000000 Name: PDP_ADDIPV4_01, dtype: object df.loc[df['PDP_ADDIPV4_01']!=00000000,'PDP_ADDIPV4_01'] 0 00000000 1 00000000 2 00000000 3 00000000 4 00000000 5 00000000 `` – dannisis Jun 26 '19 at 06:24