0

I have a csv-file with several columns, one contains the date in the format dd.mm.yyyy. Its entries (rows) aren't in chronological order. What I did to put them into order is transforming those values using pd.to_datetime, followed by sort_values. (Code below).

Desired output: 01.09.2019, 02.09.2019, 03.09.2019, ...., 30.03.2020

However, I get: 01.01.2020, 01.02.2020, 01.09.2019, 01.11.2019, ..., 31.12.2019

daten = pd.read_csv("gewichtstagebuch.csv", sep=";", decimal=",",
                usecols=("Datum","Gewicht (kg)"))
pd.to_datetime(daten['Datum'])
daten.sort_values(by="Datum", ascending= True)

I tried other ways to sort or convert my data set but screenshot then Python assumes the months to be days and vise versa so that I still end up having the wrong result (e.g. forcing a format and strftime).

smci
  • 32,567
  • 20
  • 113
  • 146
  • 2
    Did you try using pd.to_datetime(daten['Datum'], dayfirst=True)? – Matthew Borish Mar 31 '20 at 17:57
  • The to_datetime() function has a parameter (format) to which you can pass the format of the datetime you’re formatting so it knows how to parse. – S3DEV Mar 31 '20 at 17:58
  • `pd.to_datetime` returns a `datetime` hence it doesn't update your `daten` dataframe. The operation is not `in_place`. – Afaq Mar 31 '20 at 18:25
  • [`pd.read_csv()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html) has options `parse_dates`, and you'll want `dayfirst = True` for dd.mm.yyyy format. You might need `infer_datetime_format = True` to handle the dot separator. In general you shouldn't need to retrospectively run `to_datetime()` on a column after reading the CSV - and it wastes tons of memory on temporarily storing date-strings. – smci Mar 31 '20 at 23:11
  • @ Matthew Borish: Yes I did, it had no effect in the order of the dates. – ThePhysicist Apr 01 '20 at 15:08

3 Answers3

0

At the time of reading the data using pandas you can use "parse_dates" argument which will automatically convert the field into datetime.

daten = pd.read_csv("prob_data.csv", sep=",", decimal=",", parse_dates=[0],
                usecols=("Datum","Gewicht (kg)"))

daten.sort_values(by="Datum", ascending= True)

In the above code I have mentioned parse_dates=[0] considering the first (0) column is the date column. Once we load the data it will automatically convert the column to date column.

Now if you run the code to sort the dataframe it will sort the dataset accordingly.

Hope this helps.

Adrish
  • 52
  • 1
  • 7
  • Hi Adrish, thanks for your help. I tried parse_dates as suggested in your comment but the result remains the same. First item of my list is 2019-01-09 followed by 2019-01-10 and so on. For me it seems it still assumes 01 to be January but actually the list starts with September. – ThePhysicist Apr 01 '20 at 15:01
0

pd.to_datetime returns a datetime and hence doesn't update your daten dataframe. Therefore the sort doesn't work as expected.

Try to do the following:

daten = pd.read_csv("gewichtstagebuch.csv", sep=";", decimal=",",
                usecols=("Datum","Gewicht (kg)"))
daten = daten.assign(Datum=lambda x: pd.to_datetime(x.Datum)) # this returns a new dataframe
daten.sort_values(by="Datum", ascending= True)
Afaq
  • 1,146
  • 1
  • 13
  • 25
  • Hi Afar, this code returns the following: 2019-01-09 2019-01-10 2019-01-11 2019-02-09 2019-02-10 ... so unfortunately still no chronological order. – ThePhysicist Apr 01 '20 at 15:05
0

For those who have the same or a similar problem - this solution suggested by smci solved the issue:

daten = pd.read_csv("gewichtstagebuch.csv", sep=";", decimal=",", parse_dates=[0],
                infer_datetime_format= True, dayfirst=True,
                usecols=("Datum","Gewicht (kg)"))
daten.sort_values(by="Datum", ascending= True)

So the combination of parse_dates, infer_datetime_format and dayfirst. Now I get the desired order:

2019-09-01
2019-09-02
2019-09-03
2019-09-04
...

TylerH
  • 20,799
  • 66
  • 75
  • 101