0

My understanding is that the feather format's advantage is that it preserves types. So I expected that the object dtype of variable state would be preserved, but it's not. Why? Is there a way around this?

import sys
import pandas
from pandas import Timestamp
print(pandas.__version__)
## 1.3.4
print(sys.version)
## 3.9.7 (default, Sep 16 2021, 08:50:36) 
## [Clang 10.0.0 ]


d = pandas.DataFrame({'Date': {0: Timestamp('2020-12-01 00:00:00'), 1: Timestamp('2020-11-01 00:00:00'), 2: Timestamp('2020-10-01 00:00:00'), 3: Timestamp('2020-09-01 00:00:00'), 4: Timestamp('2020-08-01 00:00:00')}, 'state': {0: 1, 1: 1, 2: 1, 3: 1, 4: 1}, 'value': {0: 3.1, 1: 3.4, 2: 3.9, 3: 5.9, 4: 6.4}})

d.dtypes
# Date     datetime64[ns]
# state             int64
# value           float64
# dtype: object

d["state"] = d["state"].astype(object)

d.dtypes
# Date     datetime64[ns]
# state            object
# value           float64
# dtype: object

d.to_feather("test.feather")

d = pandas.read_feather("test.feather")
d.dtypes
# Date     datetime64[ns]
# state             int64
# value           float64
# dtype: object

I want state to be a "string" or "object", but not an "int64". I don't want to have to recast every time I load the dataframe. Thanks!

PatrickT
  • 10,037
  • 9
  • 76
  • 111
  • 1
    Did you try `d["state"] = d["state"].astype(str)`? – Quang Hoang Nov 15 '21 at 17:31
  • No, I haven't. I was under the impression that `object` was the correct label for strings. Although I have come across a reference to a new way of dealing with strings with new version of pandas, which I also have not tried. Let me try it now. Thanks. – PatrickT Nov 15 '21 at 17:43
  • Wow. That appears to work. The `dtype` is still `object` (as apposed to `string` or something). Confusing... If you can explain that, I'll tick your answer. Thanks! – PatrickT Nov 15 '21 at 17:47
  • astype(object) does not change the underlying data, just changes how pandas label the series. While astype(str) changes the underlying data to string and label the series as objects. – Quang Hoang Nov 15 '21 at 17:59
  • I see. Is it something I ought to have known? I noticed that elsewhere in my code I had used `astype(str)` and hadn't run into the problem, but I had since forgotten and typed `astype(object)` because that is the name listed when doing `df.dtypes`. Please do make it an answer. :-) – PatrickT Nov 15 '21 at 18:14

1 Answers1

1

A while back Quang Hoang suggested in the comments that the following works:

d["state"] = d["state"].astype(str)

I have no explanation to offer. I'll be happy to select any other, better answer.

PatrickT
  • 10,037
  • 9
  • 76
  • 111