How can I be sure if a file was processed before? There is a remote storage location which is a file source for my application. My program gets files from this location and processes them in a scheduled way. How can I be sure that the next time I fetch only non-processed files? I'm thinking about using file attributes. The archive and modified date can be a solution. But I learned that two bits of file attributes are not used. How can I use these fields in Java? By the way I don't want to use a database.
-
What do you mean "two bits of file atributes not used"? What attributes do you want to use? – Matthew Flaschen Mar 26 '11 at 21:44
-
What is the operating system on the remote side? How do you access the files (network share, ftp, ...) ? – pajton Mar 26 '11 at 21:53
-
What filesystem? Windows:NTFS/VFat, Linux: (zillions), ...? – user unknown Mar 26 '11 at 21:56
-
Can you be sure, that you don't have to process a file multiple times? That a filename will not be repeated? Is it mainly a performance issue, or is it a serious problem to process a file 2 times? You can keep the names in memory. Do you alter the files, or just read them? You can store some informations in a file/ a database, to allow to restart your program. – user unknown Mar 26 '11 at 22:00
2 Answers
A common strategy is to use some form of hash function to create a checksum. Record the checksum of the file, and compare the list of processed files identified by checksum against the file in question. If the checksum of the file in question is in the list, you have already processed it.
Protect your list of processed file checksums. If you lose it, or it becomes corrupted, it might be a long, bad day.
To prevent unnecessary network traffic, you might consider preparing 'check' files on the remote repository that contain a checksum that corresponds to a potential input file.
EDIT:
Upon further comment, it is potentially possible to directly interact with file system attributes. The proposed Java 1.7 spec introduces file-system specific attribute views to directly interact with these attributes. The view you would be interested in is 'DosFileAttributeView'.
Basic use might be something similar to this ('input' is a file based on a java 'Path'; add necessary exception handling):
// import as necessary from java.nio.file and java.io
DosFileAttributeView view = input.getFileAttributeView(DosFileAttributeView.class);
//Check if the system supports this view
if (view != null)
{
DosFileAttributes attributes = view.readAttributes();
// skip any file already marked as an archive
if (!attributes.isArchive())
{
myObject.process(input)
attributes.setArchive(true)
}
}

- 416
- 5
- 9
-
1
-
I certainly did not intend to suggest one either. What part of my answer do you think implied that? – JustinC Mar 26 '11 at 22:06
-
1Hm, storing the checksums of files. From the OP's suggestion of using file attributes I assume he wants to be able to tell if the file has been already processed by just looking at the file, without additional data. – pajton Mar 26 '11 at 22:13
-
OP will have to have something to compare it to, and unless the program will exist in memory indefinately, that comparison value will have to be saved or persisted in some way, whether in a database proper, a loosely structured text file, or some binary serialization. – JustinC Mar 26 '11 at 22:17
-
Let me explain my problem completly,I try to index and archive all files in a directory.Files in this directory can be produced by a program dynamicly and a continuious way or user can put files to this directory manually.Remote machine can be any operating system using NTFS .And i learn that evey file has a associated byte which is used for file attributes and currently only six of them in used. – ayengin Mar 27 '11 at 21:14
-
I am not have to sure a %100 but i want to be close not to process a file more than ones. – ayengin Mar 27 '11 at 21:19
-
The archive file attribute is a legacy attribute generally reserved for archival/backup software, and its use is voluntary (and use is not really enforced in any manner), so unless your software is intended as backup software, I would hesitate to encourage you to use it. However I will append my answer to show a basic use of DosFileAttributeView (that has a specific method to query and set the archive bit for dos based systems). – JustinC Mar 27 '11 at 22:16
-
@JustinC i am using sshj as java ssh library and quartz for scheduling file access so storing a hash infeasible and a can allready access file attributes but archive will not do this.I better to set lastmodified date to a certain date and compare this.But if i can use two unused bit of file attributes this will be great. – ayengin Mar 29 '11 at 19:14
Can you rename the file (e.g. "filename.archive")? or into an "archive" subdirectory?

- 37,399
- 13
- 80
- 138
-
This files log files will be used when needed so i cant rename them.And can change their extensions – ayengin Mar 27 '11 at 21:23
-
Main problem is that am a trying to not effect any other application which also use same files for different purpose. – ayengin Mar 27 '11 at 21:25