Pandas to CSV column datatype

Question

I’m using Pandas and SQL Alchemy to import data from SQL. One of the SQL columns is datetime. I then covert the SQL data into a Pandas dataframe, the datetime column is “datetime64” – which is fine. I am able to use Matplotlib to plot any of my other columns against datetime. I then covert my pandas dataframe to a csv using:

 df.to_csv('filename')

This is to save me having to keep running a rather large sql query each time i log on. If i then try to read the csv back into python and work from that, the datetime column in now of datatype “object” rather than ”datetime64”. This means Matplotlib won't let me plot other columns against datetime because the datetime column is the wrong datatype.

How do I ensure that it stays as the correct datatype during the df to csv process?

EDIT:

The comments/solutions to my original post did work in getting the column to the correct dtype. However I now have a different problem. When i plot against the "datetime" column is looks like this:

When it should be looking like this (this is how it looks when I'm working directly with the SQL data).

I assume the datetime column is still not quite in the correct dtype (even though it states it is datetime64[ns].

Did you tried to convert the column to a datetime format https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_datetime.html ? — mhannani, Jul 22 '20 at 19:21
As already noted, csv format does not have the ability to store dtypes. Your options are to select a different file type that has the ability to store dtypes, or, as shown in the duplicate, specify the column to be parsed as a datetime when reading the file with `.read_csv()`. — Trenton McKinney, Jul 22 '20 at 19:29
By the looks of your graph, only the data for the first third or so of each month is messed up. This is probably because those dates are written to CSV in dd-mm-yyyy format, while pandas (or really the C++ library pandas uses) defaults to mm-dd-yyyy. Try using `pd.read_csv(...., dayfirst=True)` to parse it again. — Swier, Jul 23 '20 at 08:59

score 1 · Answer 1 · answered Jul 22 '20 at 19:24

CSV is a plain text format and does not specify the data type of any column. If you are using pandas to read the csv back into python, pd.read_csv() provides a few ways to specify that a column represents a date. https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html

Try pd.read_csv(file.csv, parse_dates=[<colnum>]), where colnum is an integer index to your date column.

read_csv() provides additional options for parsing dates. Alternatively, you could use the 'dtypes' arg.

score -1 · Answer 2 · answered Jul 22 '20 at 19:23

Unfortunately, you can not store datatype in CSV format.

One thing you can do if you are only reading the file in python is to use pickle

you can do that like:

import pickle
with open('filename.pkl', 'wb') as pickle_file:
    pickle.dump(your_csv_file, pickle_file)

and you can load it using

with open('filename.pkl', 'rb') as pkl_file:
    csv_file = pickle.load(pkl_file)

Pandas to CSV column datatype

2 Answers2