0

I have created a python script to read from a continuously updated file ('out.txt') and write to a different file ('received.txt') every 10 seconds. Now I need to figure out how to delete the already read data from the 'out.txt' file. Here is the code I have so far.

#!/usr/bin/python

import sys
import time

num_lines = sum(1 for line in open('out.txt')) #find the last line
print num_lines

sys.stdout = open('received.txt', 'w')  #write to the received.txt file
print

f = open('out.txt', 'r') #open ‘out.txt’ with read permissions
f.readline(num_lines)    #read the last line of ’out.txt’
while True:              #start loop to print remaining lines in out.txt
   for line in f:
      print line
   time.sleep(10)        #sleep for 10 seconds

Do I delete the data in 'out.txt' after the loop or inside the loop? Should I use f.write for this? I am using Raspbian on a Raspberry Pi for this. The data for 'out.txt' looks like

iBeacon scan ...

3F234454-CF6D-4A0F-ADF2-F4911BA9FFA6 1 1 -71 -66

3F234454-CF6D-4A0F-ADF2-F4911BA9FFA6 1 1 -71 -66

3F234454-CF6D-4A0F-ADF2-F4911BA9FFA6 1 1 -71 -66

... keeps updating.

Any advice would be extremely helpful. Thank you!

Thomas Hall
  • 43
  • 1
  • 6
  • Check http://stackoverflow.com/q/10349781/1860929 – Anshul Goyal Jul 31 '14 at 14:38
  • Using a named pipe for out.txt is a option? – Raul Andres Jul 31 '14 at 14:38
  • Try using a named pipe instead, using a plain text file sounds really awkward. – roippi Jul 31 '14 at 14:38
  • The unknown truth is that you never delete some lines from a file, you always rewrite the whole file (all applications do that behind the curtains). Why don't you use a pipe instead of a file? – Paulo Scardine Jul 31 '14 at 14:39
  • If you open the file in write/read mode (`w+`), it will truncate (remove all contents) of the file. – okoboko Jul 31 '14 at 14:39
  • Be careful with the named pipe. If opened in blocking mode (default) the producer process can block waiting for a consumer to open/read from the pipe. This may be a problem for the producer if it is not expecting it. – mhawke Jul 31 '14 at 15:14

1 Answers1

0

There's a problem with this approach - on posix at least (i.e. pretty much everything but windows), as long as any process has an open file handle - that file still basically exists on disk (UNTIL) all open file handles are closed.

So if you have two processes, one writing, and another reading and trimming - the writing process is going to have to be aware that the file got truncated (or deleted) and re-open the destination file every time.

That is a ridiculously ugly way of doing it - it requires communication between producer and consumer and is basically unnecessary.

Smart money would just use something like logrotate where it's got builtin mechanisms to run a 'HUP' or 'restart' command to notify the producer that the file has been truncated.

If you really just want round-robin data, why not use sqlite with a schema that 'wraps' around as you hit the maximum number of lines you want to consume?

This example provides a table that will remove the oldest record and insert a new one when you hit the maximum of 20 records. Depending on the amount of data churn this could be a luxury you can't afford. But if you just want the last 1000 measurements of CPU load before system crash.. it'll work fine. In fact, it could do some much more elegant things like generate running averages, etc through triggers using SQL rather than writing code..

CREATE TABLE activity_t (
  id        INTEGER PRIMARY KEY AUTOINCREMENT,
  seq       INTEGER UNIQUE,
  ts        TEXT DEFAULT CURRENT_TIMESTAMP,
  bin       TEXT NOT NULL,
  path      TEXT NOT NULL);

-- sqlite_sequence table:
INSERT INTO activity_t ( seq, bin, path ) VALUES ( -1, 'init', 'init' );
DELETE FROM activity_t WHERE seq = -1;

-- view
CREATE VIEW activity AS SELECT id, seq, ts, bin, path FROM activity_t;

-- trigger to snipe inserts and handle the 'wrap around' limitation
CREATE TRIGGER activity_trg
  INSTEAD OF INSERT ON activity
  FOR EACH ROW
  BEGIN
    INSERT OR REPLACE INTO activity_t ( seq, bin, path ) VALUES (
      ( SELECT seq + 1 FROM sqlite_sequence WHERE name = 'activity_t' ) %
    20,
    NEW.bin,
    NEW.path);
  END;
synthesizerpatel
  • 27,321
  • 5
  • 74
  • 91