0

I have a large csv file that I need to reduce to the last 1000 lines through a cron job every day.

Can anyone suggest how to accomplish this?

What I have so far are two commands but do not know how to combine them

For deleting lines from the beggining of the file the command is

ed -s file.csv <<< $'1,123d\nwq'

where 123 is the number of lines needed to delete from the beginning of the file

For reading the number of lines in the file the command is

wc -l file.csv

I would need to subtract 1000 from this and pass the result to the first command Is there any way to combine the result of wc command in the ed command?

Thank you in advance

manolish
  • 47
  • 8
  • possible duplicate of [How can I remove all but the last 10 lines from a file?](http://stackoverflow.com/questions/3775383/how-can-i-remove-all-but-the-last-10-lines-from-a-file) – Elliott Frisch May 29 '15 at 03:04
  • 3
    Use: `tail -n -1000 file > newfile; mv newfile file` – user3439894 May 29 '15 at 03:10
  • Shelter this works, but it would be more convenient if I could edit the file and not create a new file. – manolish May 29 '15 at 14:39
  • 1
    Slight improvement: `tail -n 1000 file > newfile; cat newfile > file; rm newfile` This preserves the ownership of 'file'. If you run from cronjob, as root, mv will mean file ends up being owned by root. – BenTaylor Sep 30 '16 at 09:14

1 Answers1

1

Assuming bash is the shell, 'file' is the file (and it exists) :

sed -i "1,$(( $(wc -l < file) - 1000 ))d" file

Edit: the brief version above will not work cleanly for files with 1000 or fewer lines. A more robust script, handling all .csv files in a particular directory:

#!/bin/env bash

DIR=/path/to/csv/files
N=1000

for csv in $DIR/*.csv; do
  L=$(wc -l < $csv)
  [ $L -le $N ] && continue
  sed -i "1,$(($L - $N))d" $csv
done

Next edit: handle a directory with no .csv files?

sjnarv
  • 2,334
  • 16
  • 13