11

I have a scenario where I am uploading .csv files to a specific folder, /tmp/data_upload, every day, and the old files are replaced by the new one.

I need to run a Python script once the data is uploaded. For this, I have an idea to create a cron job and monitor the changes in the file. I tried using inotify, but I am not much into the Unix domain. How can I do that?

I need to execute the script test.py once there is a date change of a file in the upload folder, for example, /tmp/data_upload.

Peter Mortensen
  • 2,318
  • 5
  • 23
  • 24
Alex
  • 172
  • 1
  • 1
  • 8
  • Have you looked at http://eradman.com/entrproject/ , haven't tried it myself but it looks like it may be related. – O.O. Jan 07 '19 at 11:57
  • FYI, Python has `inotify` libraries available. See one of my answers here for an example: https://askubuntu.com/a/939392/295286 – Sergiy Kolodyazhnyy Jan 08 '19 at 02:11

4 Answers4

11

You might need incrond (inotify cron daemon) which will monitors changes on files and then execute scripts.

Incrond can monitor add new file, modify, delete and many more. This is an article shows what event incrond can monitor with some example.

Example for your case, you might create the file /etc/incron.d/data_upload with the contents

/tmp/data_upload IN_CREATE,IN_MODIFY /path/to/test.py 
Jenny D
  • 27,780
  • 21
  • 75
  • 114
victoroloan
  • 196
  • 4
  • 2
    Whilst this may theoretically answer the question, [it would be preferable](http://meta.stackoverflow.com/q/8259) to include the essential parts of the answer here, and provide the link for reference. – Gerald Schneider Jan 07 '19 at 09:10
  • Thanks for reminding me, I have added the context for the link. – victoroloan Jan 07 '19 at 09:19
  • Thanks for the answer, just to verify the steps after installing incrontab shoudl execute `incrontab -e` as root then include this line `/tmp/data_upload IN_CREATE,IN_MODIFY test.py ` ? so that to check once I upload a new file it should execute the test.py file ? where should I place the test.py file ? should i need to provide absolute path for this ? – Alex Jan 07 '19 at 09:56
  • 1
    I think, It will be better to put the absolute path for your script. You can also check cron or system log if the script seems not working – victoroloan Jan 07 '19 at 10:18
  • Can you also document what file you are referring to with your code block, people who are not familiar with the syntax of Incrond (like me) may think are referring to a command that you have to execute on the command line – Ferrybig Jan 07 '19 at 15:08
  • @Ferrybig I think I added enough information to cover that now – Jenny D Jan 08 '19 at 09:38
2

You could use entr to automatically run the script everytime a file changes by running ls /tmp/data_upload | entr -p script.py once at startup.

Project website: http://eradman.com/entrproject/

Online man page: https://www.systutorials.com/docs/linux/man/1-entr/

jln-ho
  • 21
  • 1
1

The watchexec (https://watchexec.github.io/) command line utility sounds like exactly what you need, although I believe to install it you'd need to have the Rust build tools installed on your machine, so that may be a dealbreaker

TeNNoX
  • 103
  • 3
Ben Sandeen
  • 111
  • 3
0

My general approach would be to fiddle with the classical Unix find utility. For example, the command

find /tmp/upload_data/*.csv -mtime -1 -exec /home/myname/test.py

will find any .csv files in /tmp/upload_data that have been modified less than one day ago, and run your test.py if it finds any. Of course, if your test.py file is in some other directory, you want to update your path to it accordingly.

If you run your cron job more often than once a day, you can use the mmin option to find to specify the maximal time since modification in minutes. For example,

find /tmp/upload_data/*.csv -mmin -60 -exec /home/myname/test.py

will search for .csv files that were modified less than 60 minutes ago -- useful if cron runs the job hourly.

Two fair warnings are in order: First, this won't catch .csv files that you entirely deleted. You may want to check for these separately. Second, I did not have time to test any of this. Expect typos in my code that you'll have to debug by yourself.