0

I have a text file called temp.txt and I want to delete all rows in it if the date is older than 24 hours from 21:45pm everyday. I've done a lot of googling and can't find the answer anywhere. The text file is in this format with no headers:

http://clipsexample1.com,clips1,clipexample123,2019-03-28 17:14:14
http://clipsexample12com,clips2,clipexample234,2019-03-27 18:56:20

Is there anyway I could remove the whole row if it is older than 24 hours (the second clip in the example)

EDIT: I have tried using this code but that's just removing todays date, how do I get it to remove today-24 hours?

save_path = 'clips/'
completeName = os.path.join(save_path, 'clips'+str(today)+'.txt')
good_dates = [str(today)]
with open('temp.txt') as oldfile, open(completeName, 'w') as newfile:
    for line in oldfile:
        if any(good_date in line for good_date in good_dates):
            newfile.write(line)

EDIT 30/03/2019: Here is my full code to try and understand how the timestamp field is created:

#change UNIX to standard date format
def get_date(created_utc):
    return dt.datetime.fromtimestamp(created_utc)
_timestamp = topics_data["created_utc"].apply(get_date)
topics_data = topics_data.assign(timestamp = _timestamp)
timestamp = _timestamp
print(timestamp)

#remove UNIX data column
topics_data.drop('created_utc', axis=1, inplace=True)

#export clips to temp.txt
topics_data.to_csv('temp.txt', header=True, index=False)

import csv
from datetime import datetime, timedelta
import os


today = datetime.today()
cutoff = datetime(year=today.year, month=today.month, day=today.day,
                  hour=21, minute=45)
max_time_diff = timedelta(hours=24)

input_file = 'temp.txt'
save_path = './clips'
complete_name = os.path.join(save_path, 'clips'+today.strftime('%Y-%m-%d')+'.txt')
os.makedirs(save_path, exist_ok=True)  # Make sure dest directory exists.

with open(input_file, newline='') as oldfile, \
     open(complete_name, 'w', newline='') as newfile:

    reader = csv.reader(oldfile)
    writer = csv.writer(newfile)

    for line in reader:
        line_date = datetime.strptime(line[3], "%Y-%m-%d %H:%M:%S")
        if cutoff - line_date < max_time_diff:
            writer.writerow(line)

When I print the timestamp field, this is the result i get:

01   2019-03-29 01:22:09
02   2019-03-29 02:42:21
03   2019-03-28 17:14:14
04   2019-03-29 06:06:18
Name: created_utc, dtype: datetime64[ns]

And the error I am still getting is:

ValueError: time data 'timestamp' does not match format '%Y-%m-%d %H:%M:%S'

Even though the datetime is printing in that format?

Oli Shingfield
  • 109
  • 1
  • 9
  • Yes, there are ways, and one would be to read the file and write out a new one (at the same time) without the rows/lines in it you don't want. Since this looks like a CSV file, I would suggest using the [`csv`](https://docs.python.org/3/library/csv.html#module-csv) module to do both. – martineau Mar 28 '19 at 22:47
  • hi, thanks for the reply, please see my edit of my original post and let me know your thoughts. – Oli Shingfield Mar 28 '19 at 22:55

1 Answers1

1

Here's how to do it using the csv module as I suggested in a comment:

import csv
from datetime import datetime, timedelta
import os


today = datetime.today()
cutoff = datetime(year=today.year, month=today.month, day=today.day,
                  hour=21, minute=45)
max_time_diff = timedelta(hours=24)

input_file = 'date_temp.txt'
save_path = './clips'
complete_name = os.path.join(save_path, 'clips'+today.strftime('%Y-%m-%d')+'.txt')
os.makedirs(save_path, exist_ok=True)  # Make sure dest directory exists.

with open(input_file, newline='') as oldfile, \
     open(complete_name, 'w', newline='') as newfile:

    reader = csv.reader(oldfile)
    writer = csv.writer(newfile)

    next(reader)  # Skip header.
    for line in reader:
        line_date = datetime.strptime(line[3], "%Y-%m-%d %H:%M:%S")
        if cutoff - line_date < max_time_diff:
            writer.writerow(line)

print('done')
martineau
  • 119,623
  • 25
  • 170
  • 301
  • Hi there, I tried using your code but I a getting the error: "ValueError: time data 'timestamp' does not match format '%Y-%m-%d %H:%M:%S'", see link: https://pastebin.com/5XePadwk – Oli Shingfield Mar 29 '19 at 09:51
  • Oil: From the error message, it sounds like the format of the time data in the actual file doesn't match what's shown in your question: e.g. `2019-03-28 17:14:14`. – martineau Mar 29 '19 at 13:51
  • Hi there, thanks for the reply, could you kindly take a look at the edit of my original post and let me know what you think? Thank you – Oli Shingfield Mar 30 '19 at 11:29
  • As I said, the error indicates the data in the file doesn't match what's shown in your question. I suspect the file _does_ have a header row at the beginning you (despite your claim that it "is in this format with no headers"), because now, with your edit I can see it is being created by calling `topics_data.to_csv('temp.txt', header=True, index=False)` — and I note the `header=True` part. Having a header would also cause the `ValueError`. – martineau Mar 30 '19 at 16:49
  • I added a `next(reader)` to my answer to skip the header row which I now think is in the data file. Note that the way things are, the new file being written _won't_ have this header in it since it's just being ignored (i.e. as opposed to copied). – martineau Mar 30 '19 at 17:10
  • Thank you very much, amazing – Oli Shingfield Mar 31 '19 at 20:57
  • Oli: You're welcome. You might also find my answer to the question [When processing CSV data, how do I ignore the first line of data?](https://stackoverflow.com/questions/11349333/when-processing-csv-data-how-do-i-ignore-the-first-line-of-data) useful because if you used it, this code could be modified to conditionally skip the first line (instead of it being hardcoded to do so). – martineau Mar 31 '19 at 21:03