2

I’m using Pandas and SQL Alchemy to import data from SQL. One of the SQL columns is datetime. I then covert the SQL data into a Pandas dataframe, the datetime column is “datetime64” – which is fine. I am able to use Matplotlib to plot any of my other columns against datetime. I then covert my pandas dataframe to a csv using:

 df.to_csv('filename')

This is to save me having to keep running a rather large sql query each time i log on. If i then try to read the csv back into python and work from that, the datetime column in now of datatype “object” rather than ”datetime64”. This means Matplotlib won't let me plot other columns against datetime because the datetime column is the wrong datatype.

How do I ensure that it stays as the correct datatype during the df to csv process?

EDIT:

The comments/solutions to my original post did work in getting the column to the correct dtype. However I now have a different problem. When i plot against the "datetime" column is looks like this:

enter image description here

When it should be looking like this (this is how it looks when I'm working directly with the SQL data).

enter image description here

I assume the datetime column is still not quite in the correct dtype (even though it states it is datetime64[ns].

Mark
  • 109
  • 5
  • Did you tried to convert the column to a datetime format https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_datetime.html ? – mhannani Jul 22 '20 at 19:21
  • 1
    As already noted, csv format does not have the ability to store dtypes. Your options are to select a different file type that has the ability to store dtypes, or, as shown in the duplicate, specify the column to be parsed as a datetime when reading the file with `.read_csv()`. – Trenton McKinney Jul 22 '20 at 19:29
  • 1
    By the looks of your graph, only the data for the first third or so of each month is messed up. This is probably because those dates are written to CSV in dd-mm-yyyy format, while pandas (or really the C++ library pandas uses) defaults to mm-dd-yyyy. Try using `pd.read_csv(...., dayfirst=True)` to parse it again. – Swier Jul 23 '20 at 08:59
  • Is your dataframe sorted by `Datetime` ? – Balaji Ambresh Jul 23 '20 at 12:56

2 Answers2

1

CSV is a plain text format and does not specify the data type of any column. If you are using pandas to read the csv back into python, pd.read_csv() provides a few ways to specify that a column represents a date. https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html

Try pd.read_csv(file.csv, parse_dates=[<colnum>]), where colnum is an integer index to your date column.

read_csv() provides additional options for parsing dates. Alternatively, you could use the 'dtypes' arg.

proteome
  • 316
  • 1
  • 4
-1

Unfortunately, you can not store datatype in CSV format.

One thing you can do if you are only reading the file in python is to use pickle

you can do that like:

import pickle
with open('filename.pkl', 'wb') as pickle_file:
    pickle.dump(your_csv_file, pickle_file)

and you can load it using

with open('filename.pkl', 'rb') as pkl_file:
    csv_file = pickle.load(pkl_file)
JaySabir
  • 322
  • 1
  • 10