2

I am trying to create a script that will process modified/new files in a directory (which is being mirrored from a remote directory by lftp but that's another story).

To keep track of the files that are modified I use fswatch. I then convert the files detected by fswatch from xml to json and store them in a separate directory. To make sure that I can stop this conversion once there are no more files to process (when the mirroring job is over) I keep track of a file that will be created by the mirroring process upon completion.

My script works, BUT for a strange reason I do not see the json files until the mirroring job is completed. It's as if the converted files are stored somewhere in memory and as soon as the 'stopping' condition is true those files magically appear in the directory.

Is this a normal behaviour? How can I make the files appear as soon as they are processed? In what ways can I optimize what I am trying to achieve? (I'm a newbie in bash... and programming in general.)

Here's the script that I use:

my_convert_xml_to_json_function () {
    if [ -f "$1" ]; then
        temporary_file_name_for_json=$(echo "${1/$path_to_xml_files\/}" | base64)
        xml2json < "$1" | jq -rc '.amf' > "${path_to_json_files}/${temporary_file_name_for_json}.txt"
    fi
}
export -f my_convert_xml_to_json_function
export path_to_xml_files
export path_to_json_files

# repeat watching for files until the mirroring is over
fswatch -0 --event Updated --event Created "${path_to_xml_files}" | grep -ai 'xml$' | xargs -0 -n 1 -I {} bash -c 'my_convert_xml_to_json_function "{}"' & 

temporary_pid_of_fswatch=`jobs -p`
echo "This is PID of the last bit in the pipeline: $!; this is PID of the fswatch: ${temporary_pid_of_fswatch}"


# now check for the existence of a stopping rule
while [[ $(shopt -s nullglob; set -- "${my_temporary_files}"/xml-mirrorring-started-on-*-is-completed.txt; echo $#) -eq 0 ]]; do
    # tell the script to stop and remove the file generated by the mirror into the trashcan
        sleep 1 && temp_continue_check="running `date`"
        echo "Stop condition met (${temp_continue_check})."
done && kill -15 "${temporary_pid_of_fswatch}" && mv -v "${my_temporary_files}"/xml-mirrorring-started-on-*-is-completed.txt "$my_trashcan"

EDIT: so following comment from @snorp, if I add sync to the script, then I am able to get 'real time' updating of the files. Otherwise, the files are somewhere in the air... if a process is running in the background and I type sync I get a new process that seems to 'freeze' (based on top output I can see it's doing something, but I don't see the processed files written into the folder like they should (eventually) be). Is there any way to force OSX to actually write these files to disk (without including sync in the script)?

econ
  • 547
  • 7
  • 22
  • If the files are small you're likely correct that they're being held in memory. A flush command after each file might achieve what you're after. If you're just interested in seeing what file is being processed you could also just throw a touch ${path_to_json....}/${temporary...} which would create the file first so you could see it before it is completely written. – snorp Apr 20 '16 at 01:06
  • The files are very small, less than 3KB per file on average. Uhm... is it possible then that I lose information if there are more files in memory than memory can handle? – econ Apr 20 '16 at 01:16
  • 1
    Not unless your RAM is bad :). As a side note this has been the default behavior in Linux for a while. Also I said flush command, but rather the command is sync and it flushes memory and causes it to write to disk. Man page at http://linux.die.net/man/8/sync – snorp Apr 20 '16 at 01:37
  • @snorp: thanks, I'm on a Mac, so hopefully their RAM is as good as it is expensive. – econ Apr 20 '16 at 01:41
  • 1
    I htink your usage of xargs makes it slurp up the filenames until the whole process is stopped, then xargs starts the script which creates the files. – Gunstick May 02 '16 at 12:56

0 Answers0