0

I have this csv file which look like this:

datetime_period,value
01 Apr 2021 00:00,92.85
01 Apr 2021 00:30,91.73
11 Feb 2021 19:30,88.58
11 Feb 2021 20:00,88.58
13 Jan 2021 13:00,80.49
13 Jan 2021 13:30,81.95

I want to sort this csv according to the datetime in column datetime_period.

I used this link Sorting a csv object by dates in python as a guide.

Here is my code;

data = csv.reader(open(csv_file_path, 'r'))
data = sorted(data, key=lambda row: datetime.strptime(row[0], "%d %b %Y %H:%M"))

I get the error:

ValueError: time data 'time_period' does not match format '%d %b %Y %H:%M'
python-BaseException

I looked at the code and think my code does match the datetime format. Can someone tell me what is wrong with the code or provide an alternative solution?

I am using python 3.8.5

martineau
  • 119,623
  • 25
  • 170
  • 301
user3848207
  • 3,737
  • 17
  • 59
  • 104

2 Answers2

4

You need to skip the header row at the beginning of the file. This can easily be done by passing the csv.reader object to the built-in next() function to iterate it once. The code below shows how to do that and have the file closed properly after its contents have been read.

import csv
from datetime import datetime

csv_file_path = 'time_data.csv'

with open(csv_file_path, 'r', newline='') as file:
    reader = csv.reader(file)
    next(reader)  # Skip over header row.
    data = sorted(reader, key=lambda row: datetime.strptime(row[0], "%d %b %Y %H:%M"))

print(data)

If you're unsure whether a header is present or not, you can do the skipping conditionally — see my answer to the very similar How to ignore the first line of data when processing CSV data? question.

martineau
  • 119,623
  • 25
  • 170
  • 301
1

You're trying to sort your header line, which does not contain datetime values.

A way around this is to pop off the headers prior to sorting, and add them back after.

data = [line for line in csv.reader(open('test.csv', 'r'))]

headers = data.pop(0)

data = sorted(data, key=lambda row: datetime.strptime(row[0], "%d %b %Y %H:%M"))
#[['13 Jan 2021 13:00', '80.49'], ['13 Jan 2021 13:30', '81.95'], ['11 Feb 2021 19:30', '88.58'], ['11 Feb 2021 20:00', '88.58'], ['01 Apr 2021 00:00', '92.85'], ['01 Apr 2021 00:30', '91.73']]

out = [headers] + data

#[['datetime_period', 'value'], ['13 Jan 2021 13:00', '80.49'], ['13 Jan 2021 13:30', '81.95'], ['11 Feb 2021 19:30', '88.58'], ['11 Feb 2021 20:00', '88.58'], ['01 Apr 2021 00:00', '92.85'], ['01 Apr 2021 00:30', '91.73']]
PacketLoss
  • 5,561
  • 1
  • 9
  • 27