1

So I have a Python program that pulls access logs from remote servers and processes them. There are separate log files for each day. The files on the servers are in this format:

access.log
access.log-20130715
access.log-20130717

The file "access.log" is the log file for the current day, and is modified throughout the day with new data. The files with the timestamp appended are archived log files, and are not modified. If any of the files in the directory are ever modified, it is either because (1) data is being added to the "access.log" file, or (2) the "access.log" file is being archived, and an empty file takes its place. Every minute or so, my program checks for the most recent modification time of any files in the directory, and if it changes it pulls down the "access.log" file and any newly archived files

All of this currently works fine. However, if a lot of data is added to the log file throughout the day, downloading the whole thing over and over just to get some of the data at the end of the file will create a lot of traffic on the network, and I would like to avoid that. Is there any way to only download a part of the file? If I have already processed, say 1 GB of the file, and another 500 bytes suddenly get added to the log file, is there a way to only download the 500 bytes at the end?

I am using Python 3.2, my local machine is running Windows, and the remote servers all run Linux. I am using Chilkat for making SSH and SFTP connections. Any help would be greatly appreciated!

randrews
  • 13
  • 4
  • Check out [How to FTP 'get' a partial file only](http://serverfault.com/questions/18834/how-to-ftp-get-a-partial-file-only) on [serverfault](http://serverfault.com/). – martineau Jul 18 '13 at 18:19
  • According to this section of a Wikipedia article, standard FTP does not support [partial file transfer](http://en.wikipedia.org/wiki/GridFTP#Partial_file_transfer), but apparently there's something called [GridFTP](http://en.wikipedia.org/wiki/GridFTP) that does. – martineau Jul 18 '13 at 18:27
  • Since you're try to download to a Windows machine, this [Partial FTP Downloader](http://www.codeproject.com/Articles/27439/Partial-FTP-Downloader) .Net Code Project might be helpful. – martineau Jul 18 '13 at 18:35

2 Answers2

1

Call ResumeDownloadFileByName. Here's the description of the method in the Chilkat reference documentation:

Resumes an SFTP download. The size of the localFilePath is checked and the download begins at the appropriate position in the remoteFilePath. If localFilePath is empty or non-existent, then this method is identical to DownloadFileByName. If the localFilePath is already fully downloaded, then no additional data is downloaded and the method will return True.

See http://www.chilkatsoft.com/refdoc/pythonCkSFtpRef.html

Chilkat Software
  • 1,405
  • 1
  • 9
  • 8
0

You could do that, or you could massively reduce your complexity by splitting the latest log file down into hours, or tens of minutes.

Ali Afshar
  • 40,967
  • 12
  • 95
  • 109
  • If you are referring to the log file on the local machine, I already track the number of bytes that have already been processed, and use file.seek() to skip to the new data. I just don't want to have to download the data that I don't need – randrews Jul 18 '13 at 16:38