2

Please look the code and output.

May I know why the data type in *_state column are float instead of int and how to cast those data type to int?

Thanks,

Code

print(df_test)
for idx, row in df_test.iterrows():
    print(type(row['value']))
    df_test.at[idx, row['name'] + '_state'] = row['value']
print(df_test)

Output

        Message   name  value
0  Door_Started   Door      1
1    Light_open  Light      1

type 'int'
type 'int'

        Message   name  value  Door_state  Light_state
0  Door_Started   Door      1         1.0          NaN
1    Light_open  Light      1         NaN          1.0
jpp
  • 159,742
  • 34
  • 281
  • 339
Joseph
  • 35
  • 3

2 Answers2

1

You are only assigning an integer to a single column row['name'] + '_state'. This causes, for any given index, NaN values to appear in other column(s).

NaN is considered float (see here why), so a mixture of int and NaN values will always be upcasted to float1, for any given series. You can check this for yourself:

type(np.nan)  # float

This usually does not break subsequent manipulations / calculations, and it is efficient to keep your series float. Converting such a series to int is not possible and workarounds are inefficient. Therefore, I advise you do nothing.


1 This accommodative behaviour is described in the docs:

Note: When working with heterogeneous data, the dtype of the resulting ndarray will be chosen to accommodate all of the data involved. For example, if strings are involved, the result will be of object dtype. If there are only floats and integers, the resulting array will be of float dtype.

jpp
  • 159,742
  • 34
  • 281
  • 339
0

use this after the code:

pd.options.display.float_format = '{:,.0f}'.format
print(df)

@Jpp is correct there. This will just change your visual so you can print 1 instead of 1.0

Also if using this solution make sure you read about pd.reset_option too https://pandas.pydata.org/pandas-docs/stable/options.html

anky
  • 74,114
  • 11
  • 41
  • 70