0

I'm creating a table for a unit-test. In one scenario I'd like to test what happens if all the values in one column are Null, but of the correct type. If I create a test-table like this:

df0 = pandas.DataFrame(
        columns=["a", "b", "key"],
        data=[{"a": 4, "b": 5, "key": 1}, {"a": 3, "b": 9, "key": 2}],
    )

... then the type of column b defaults to object, simply because there's no information for pandas to know what kind of column it's supposed to be.

Is there an syntax I can use to tell Pandas what the type of the column ought to be? According to the docs, something like this ought to work:

df1 = pandas.DataFrame(
        columns=["a", "b", "key"],
        data=[{"a": 7, "b": None, "key": 3}, {"a": 7, "b": None, "key": 1}],
        dtype={"b":int}
    )

Unfortunately, that gives an error:

TypeError: object of type 'type' has no len()

So what's the correct way to do this? Ideally I'd like to do it in a single statement, but it's OK to create the table and then set the type.

Update 0:

I tried this, thanks to @anky's suggestion:

df1 = pandas.DataFrame(
        columns=["a", "b", "key"],
        data=[{"a": 7, "b": None, "key": 3}, {"a": 7, "b": None, "key": 1}]
    ).astype(dtype={"b":numpy.int64})

But I get this error:

TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'

Update 1:

This syntax based on @Aditya K's suggestion isn't quite right either:

df1 = pandas.DataFrame(
        columns=["a", "b", "key"],
        data=[{"a": 7, "b": None, "key": 3}, {"a": 7, "b": None, "key": 1}],
        dtype={"b":numpy.int64}
    )

Gives this error:

TypeError: object of type 'type' has no len()

Solution

Thanks to @Anky for this solution:

df1 = pandas.DataFrame(
        columns=["a", "b", "key"],
        data=[{"a": 7, "b": None, "key": 3}, {"a": 7, "b": None, "key": 1}],
    ).astype({"b":"Int64"})

This gives the desired column types:

enter image description here

Salim Fadhley
  • 6,975
  • 14
  • 46
  • 83
  • 3
    the you can use `astype` with a dict, however you would need `'Int64'` instead of `int` since `int` doesnot support NoneTypes, something like : `pd.DataFrame(columns=["a", "b", "key"],data=[{"a": 7, "b": None, "key": 3}, {"a": 7, "b": None, "key": 1}], ).astype({"b":'Int64'})` – anky Apr 28 '21 at 14:10
  • 1
    How about setting dtype as float32 like this `pandas.DataFrame(columns=["a", "b", "key"], data=[{"a": 7, "b": None, "key": 3}, {"a": 7, "b": None, "key": 1}], dtype=np.float32 )` – Aditya Apr 28 '21 at 14:13
  • @anky, Something seems a bit wrong, can you see my first update? – Salim Fadhley Apr 28 '21 at 14:29
  • 1
    That is not what i suggested in my comment, You should either use `{"b":"Int64"}` or `pd.Int64Dtype()` , this is not same as `np.int64` check this: https://pandas.pydata.org/pandas-docs/stable/user_guide/integer_na.html – anky Apr 28 '21 at 14:31
  • 1
    Also not what I suggested in the comment either. It was `np.float32` and not `{'b':np.float32}` – Aditya Apr 28 '21 at 14:44
  • @AdityaK, but Ionly want to cast a single column. Surely I need to use the dict-style syntax? – Salim Fadhley Apr 28 '21 at 14:54
  • @anky thank you, kindly add your solution as an answer and I will mark it as correct. – Salim Fadhley Apr 28 '21 at 15:01
  • 1
    If this [answers](https://stackoverflow.com/questions/11548005/numpy-or-pandas-keeping-array-type-as-integer-while-having-a-nan-value) your question, we can close it, else you can answer the question as well :) I am glad if I helped you. Thanks – anky Apr 28 '21 at 15:18

0 Answers0