I'm creating a table for a unit-test. In one scenario I'd like to test what happens if all the values in one column are Null, but of the correct type. If I create a test-table like this:
df0 = pandas.DataFrame(
columns=["a", "b", "key"],
data=[{"a": 4, "b": 5, "key": 1}, {"a": 3, "b": 9, "key": 2}],
)
... then the type of column b defaults to object
, simply because there's no information for pandas to know what kind of column it's supposed to be.
Is there an syntax I can use to tell Pandas what the type of the column ought to be? According to the docs, something like this ought to work:
df1 = pandas.DataFrame(
columns=["a", "b", "key"],
data=[{"a": 7, "b": None, "key": 3}, {"a": 7, "b": None, "key": 1}],
dtype={"b":int}
)
Unfortunately, that gives an error:
TypeError: object of type 'type' has no len()
So what's the correct way to do this? Ideally I'd like to do it in a single statement, but it's OK to create the table and then set the type.
Update 0:
I tried this, thanks to @anky's suggestion:
df1 = pandas.DataFrame(
columns=["a", "b", "key"],
data=[{"a": 7, "b": None, "key": 3}, {"a": 7, "b": None, "key": 1}]
).astype(dtype={"b":numpy.int64})
But I get this error:
TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'
Update 1:
This syntax based on @Aditya K's suggestion isn't quite right either:
df1 = pandas.DataFrame(
columns=["a", "b", "key"],
data=[{"a": 7, "b": None, "key": 3}, {"a": 7, "b": None, "key": 1}],
dtype={"b":numpy.int64}
)
Gives this error:
TypeError: object of type 'type' has no len()
Solution
Thanks to @Anky for this solution:
df1 = pandas.DataFrame(
columns=["a", "b", "key"],
data=[{"a": 7, "b": None, "key": 3}, {"a": 7, "b": None, "key": 1}],
).astype({"b":"Int64"})
This gives the desired column types: