Python delete line from CSV based on date

Question

I am using python to collect temperature data but only want to store the last 24 hours of data.

I am currently generating my .csv file with this

while True:
    tempC = mcp.temperature
    tempF = tempC * 9 / 5 + 32
    timestamp = datetime.datetime.now().strftime("%y-%m-%d %H:%M   ")

    f = open("24hr.csv", "a")
    f.write(timestamp)
    f.write(',{}'.format(tempF))
    f.write("\n")
    f.close()

The .csv looks like this

The .csv this outputs looks like this

18-12-13 10:58   ,44.7125
18-12-13 11:03   ,44.6
18-12-13 11:08   ,44.6
18-12-13 11:13   ,44.4875
18-12-13 11:18   ,44.6
18-12-13 11:23   ,44.4875
18-12-13 11:28   ,44.7125

I don't want to roll over, just keep the last 24 hours of data. Since I am sampling data every 5 minutes I should end up with 144 lines in my CSV after 24 hours. so if I use readlines() I can tell how many lines I have but how do I get rid of any lines that are older than 24 hours? This is what I came up with which obviously doesn't work. Suggestions?

f = open("24hr.csv","r")
lines = f.readlines()
f.close()

if lines => 144:
   f = open("24hr.csv","w")
   for line in lines:
       if line <= "timestamp"+","+"tempF"+\n":
           f.write(line)
           f.close()

Please elaborate on "obviously doesn't work": in what way(s) does it differ from what you want? — Scott Hunter, Dec 14 '18 at 17:42
I think you got your math wrong, 24hrs of 5min intervals totals 288 lines. Writing up a solution for you, hang on tight — Dalvenjia, Dec 14 '18 at 18:04

wizzwizz4 · Accepted Answer · 2021-04-17T10:49:18.573

You've done most of the work already. I've got a couple of suggestions.

Use with. This will mean that if there's an error mid-way through your program and an exception is raised, the file will be closed properly.
Parse the timestamp from the file and compare it with the current time.
Use len to check the length of a list.

Here's the amended program:

import datetime

with open("24hr.csv","r") as f:
    lines = f.readlines()  # read out the contents of the file

if len(lines) >= 144:
   yesterday = datetime.datetime.now() - datetime.timedelta(days=1)
   with open("24hr.csv","w") as f:
       for line in lines:
           line_time_string = line.split(",")[0]
           line_time = datetime.datetime.strptime(line_time_string, "%y-%m-%d %H:%M   ")

           if line_time > yesterday:  # if the line's time is after yesterday
               f.write(line)  # write it back into the file

This code's not very clean (doesn't conform to PEP-8) but you see the general process.

score 0 · Answer 2 · answered Dec 14 '18 at 17:55

Are u using linux ? If u jus need last 144 lines u can try

tail -n 144 file.csv

U can find tail for windows too, I got one with CMDer. If u have to use python and u have small file which fit in RAM, load it with readlines() into list, cut it (lst = lst[:144]) and rewrite. If u dont shure how many lines u have - parse it with https://docs.python.org/3.7/library/csv.html , parse time into python datetime (its similar like u write time originaly) and write lines by condition

Dalvenjia · Answer 3 · 2018-12-14T19:10:57.917

0

Given that 288 lines will not take up much memory, I think is perfectly fine just reading the lines, truncating the file, and putting back the desired lines:

# Unless you are working in a system with limited memory
# reading 288 lines isn't much
def remove_old_entries(file_):
    file_.seek(0)  # Just in case go to start
    lines = file_.readlines()[-288:]  # Read the last 288 lines
    file_.truncate(0)  # Empty the file
    file_.writelines(lines)  # Put back just the desired lines

    return _file

while True:
    tempC = mcp.temperature
    tempF = tempC * 9 / 5 + 32
    timestamp = datetime.datetime.now().strftime("%y-%m-%d %H:%M   ")

    with open("24hr.csv", "r+") as file_:
        file_ = remove_old_entries(file_)  # Consider that the function will return the file at the end
        file_.write('{},{}\n'.format(timestamp, tempF))

    # I hope mcp.temperature is blocking or you are sleeping out the 5min
    # else this file reading in an infinite loop will get out of hand
    # time.sleep(300)  # Call me maybe

edited Dec 14 '18 at 19:10

answered Dec 14 '18 at 18:43

Dalvenjia

1,953
1
12
16

This actually reads all of the lines, then discards all of them other than the last 288. It's storing them all in memory at once. – wizzwizz4 Dec 14 '18 at 19:16
Given the nature of the program, there shouldn't be more than 1 line discarded, I was going for `[1:]` but if there's a new file with fewer than 288 lines it will not work as expected, by using a negative slice if there are fewer than 288 line it will take them all regardless – Dalvenjia Dec 14 '18 at 19:20
I had a more performant approach in mind by opening the file in binary mode and seeking from the end, but this relies in fixed length lines `{.04f}` for the temperatures, but not sure if OP can change that format – Dalvenjia Dec 14 '18 at 19:26
You could still do that, counting the number of `\n` characters (allowing for the first character being or not being a new line, in case of a malformed file) until you get 288 lines, to minimise the amount of the file that needs to be in memory. – wizzwizz4 Dec 14 '18 at 19:59

score 0 · Answer 4 · answered Dec 14 '18 at 18:57

0

If you are on Linux or likes, the right approach is to implement logrotaion

answered Dec 14 '18 at 18:57

Suku

3,820
1
21
23

no, I specifically do not want to rotate logs. unless there is a usage there I'm not familiar with. I want the last 24 hrs, not 24hr +n whenever the last rotation was. – SR. Dec 14 '18 at 19:14
@SR. It's still the correct approach. You might need those logs later. – wizzwizz4 Dec 14 '18 at 20:00

Python delete line from CSV based on date

4 Answers4