1

In a Linux shell, I would like to treat a folder like a bag of files. There are some processes that put files into this bag. There is exactly one process that is in either one of the two following states:

  1. Process a document, then delete it
  2. Wait for an arbitrary document to exist in the folder, then process it

It does not matter in which order the documents are processed or what their name is.

What would the unique process taking files from the folder look likein bash? Processing means calling another program with the filename as argument.

Note that this unique process will not terminate until I do so manually.

Radio Controlled
  • 825
  • 8
  • 23
  • I can only think about a solution of having an external file saving the status, otherwise if you'll have 2 processes running on the same folder you'll either process the same file twice or remove it while processing with the other thread. – Yaron Jul 04 '16 at 14:03
  • No there is only one process taking files out. There are multiple putting files in, but only one process removing from the folder. – Radio Controlled Jul 04 '16 at 14:06

4 Answers4

2

You can use incrond, which stands for "inotify cron daemon". It is a daemon that runs in background and monitors directories that are specified in a table. A valid configuration can be created with

incrontab -e

This will open an editor and you could type in the directory and actions you want to watch, e.g.,

/path/to/observed/directory IN_CREATE,IN_MOVED_TO <command> $@/$#

<command> is the command or your script that you want to execute if one of the events (IN_CREATE,IN_MOVED_TO) is triggered. $@/$# is the path to the file that was created or moved to the watched folder and will be passed to <command>. That is basically all you need to do to start watching the folder.

You will have to initialize incrond once by telling it which users may use the service. You can allow users to use incrond by adding them to incrond.allow, e.g.,

echo 'root' >> /etc/incrond.allow
echo '<username>' >> /etc/incrond.allow

Notice, that root must also be in incrond.allow. Now you can start the daemon by simply calling

incrond

More information on incrond:

1

inotify-tools would be the ideal tool to get instant notifications for changes in directory. Once you get notification of the new file creation, you can process it and delete it.

Fazlin
  • 2,285
  • 17
  • 29
  • This will be complicated because multiple files might be created while I am processing one. So I would have to build a stack of added files. This is possible of course but I thought there might be an easier solution given that the order of files being put in the folder does not matter. – Radio Controlled Jul 04 '16 at 14:09
  • Read this https://techarena51.com/index.php/inotify-tools-example/. They have given examples of how to create a script and pass to inotifywait (in your case `create` event). – Fazlin Jul 04 '16 at 14:14
  • 1
    `inotify` will create notifications for every file creation inside a directory and for each notification received, you can do whatever processing you want. As far as i know, this is the cleanest solution. Would be happy to be proven wrong of course!! – Fazlin Jul 04 '16 at 14:18
  • 1
    Another link: http://stackoverflow.com/questions/24567608/using-inotify-in-a-script-to-monitor-a-directory – Fazlin Jul 04 '16 at 14:20
0

The simplest way might be to just loop over the files, "calls" the processing program and remove the files:

for f in /path/to/folder/*; do
    program_that_processes_file $f
    rm $f
done

There is a slight problem with this though: If the program that creates the files doesn't do it atomically, you could process and maybe even remove files that aren't fully written yet. The simplest solution to that is to have the writer write into a temporary directory, and then when the file is all done, move it to the final destination directory.

Some programmer dude
  • 400,186
  • 35
  • 402
  • 621
  • Won't this terminate as soon as no file exists anymore? What means od closing the for-loop? – Radio Controlled Jul 04 '16 at 14:05
  • 1
    @RadioControlled Then have the loop in another loop. Or possibly use a cron-script that invokes this once every few seconds. Or use a directory-watching program (there are a few of these around) that calls this script once there are files in the directory. – Some programmer dude Jul 04 '16 at 14:09
  • oh I just realized that 'od' is closing 'do'... I always used 'done' – Radio Controlled Jul 05 '16 at 08:55
  • @RadioControlled Oops, a little brain fart, it should of course be `done`. Must have mixed it with some other language... Sorry for the confusion. :) – Some programmer dude Jul 05 '16 at 11:54
0

How about this:

while true; do
    files=( $(ls) );
    if [[ ${#files[@]} -eq 0 || (`lsof -c write_process -- ${files[0]}`)]]; then
        sleep 5;
    else
        process ${files[0]};
        rm ${files[0]};
    fi;
done

...or this (might be unhealthy):

while true; do
    files=( $(ls) );
    if [ ${#files[@]} -ne 0 && ! (`lsof -c write_process -- ${files[0]}`)]; then
        process ${files[0]};
        rm ${files[0]};
    fi;
done

Where, as was pointed out, I have to ensure that the file is completely written into the folder. I do this by checking on lsof for the process that writes files. however, I am not sure yet how to address multiple executions of the same program as one process...

Radio Controlled
  • 825
  • 8
  • 23
  • This works. Just make sure that it is not always/often the first file in alphabetical order which is added (otherwise ${files[0]} will always/often be open as long as the write_processes are working). – Radio Controlled Jul 07 '16 at 09:57