1

I created this function in Python to check the data type of each column of a DF and make sure it becomes the data type it's supposed to be.

def dataType(df):

    '''This function makes sure the data type for all 7 columns of the final dataframe are correct'''

    #Day as date type

    df["Day"] = pd.to_datetime(df["Day"])

    df["Day"] = df["Day"].dt.strftime("%m/%d/%Y")

    #Channel, Partner as string type

    df[["Channel", "Partner"]] = df[["Channel", "Partner"]].astype(str)

    #Spend, Clicks, Impressions, Platform Conversions as float type

    df['Platform Conversions'] = df['Platform Conversions'].where(df['Platform Conversions'] == "null", 0)

    df[["Spend", "Clicks", "Impressions", "Platform Conversions"]] = df[["Spend", "Clicks", "Impressions", "Platform Conversions"]].astype(float)

    return df

And I tested each column with a code like this but they all still output saying all columns are object type. What am I doing wrong?

print(df["Day"].dtypes)
Claie
  • 25
  • 5
  • Maybe you could try building a new df from scratch, adding columns (which will be astyped to their appropriate type) one at a time. – Mark Lavin Oct 07 '21 at 22:12
  • @MarkLavin, yes that's actually what I originally did. I'm attempting to do this (create a definition) because it would save at least 200 lines of code. So is there a way to create the desired function? – Claie Oct 12 '21 at 20:18

1 Answers1

2

Check the documentation: https://pandas.pydata.org/docs/reference/api/pandas.Series.dt.strftime.html

df["Day"] = df["Day"].dt.strftime("%m/%d/%Y")

does not return a datetime object.

I test the dt.strftime function (outside a function, which is equivalent):

d = '''Day
01/01/2001
01/02/2001'''
df = pd.read_csv(StringIO(d))
df.dtypes

Day    object
dtype: object
df["Day"] = pd.to_datetime(df["Day"])
df.dtypes

Day    datetime64[ns]
dtype: object
df["Day"] = df["Day"].dt.strftime("%m/%d/%Y")
df.dtypes

Day    object
dtype: object
EBDS
  • 1,244
  • 5
  • 16
  • 1
    @Claie Kindly check the 'tick' to accept the answer if it answers your question. Thanks. – EBDS Oct 08 '21 at 00:09
  • Hi @EBDS, so how would I do the same for the other columns to their designated types then (string and float type)? – Claie Oct 12 '21 at 20:15
  • @Claie What you did is correct in coverting to the respective type. The issue here is you expected dt.strftime("%m/%d/%Y") to be a datetime but this operation returns a string. That's why you see them as object. – EBDS Oct 13 '21 at 01:06