-1

I've set up a cron job to run a Python script to scrape some web pages.

/etc/crontab

    GNU nano 2.3.1                                  File: crontab

    SHELL=/bin/bash
    PATH=/sbin:/bin:/usr/sbin:/usr/bin
    MAILTO=my_email_address@domain.com

    # For details see man 4 crontabs

    # Example of job definition:
    # .---------------- minute (0 - 59)
    # |  .------------- hour (0 - 23)
    # |  |  .---------- day of month (1 - 31)
    # |  |  |  .------- month (1 - 12) OR jan,feb,mar,apr ...
    # |  |  |  |  .---- day of week (0 - 6) (Sunday=0 or 7) OR sun,mon,tue,wed,thu,fri,sat
    # |  |  |  |  |
    # *  *  *  *  * user-name  command to be executed

    */2 * * * * root /usr/bin scrapy crawl mycrawler

However, the emails are informing me that...

/bin/bash: /usr/bin: Is a directory

When I manually run the script, it pipes data into my database, but when the cron job executes the script, nothing...

What does the /bin/bash: /usr/bin: Is a directory message allude to?!

oldboy
  • 5,729
  • 6
  • 38
  • 86

2 Answers2

0
/usr/bin

Is a fairly standard directory on Unix-like operating systems that contains most of the executable files.

i.e. you are trying to have cron execute the "scrapy crawl mycrawler" with an executable that is a directory.

You would generally have to execute a bash script (assuming bash binary is in /usr/bin directory):

*/2 * * * * root /usr/bin/bash scrapy.sh

Or a python command (again assuming python binary is in /usr/bin directory)

*/2 * * * * root /usr/bin/python scrapy.py

OR you could add scrapy absolute path to your PATH variable:

*/2 * * * * root scrapy crawl mycrawler
Jesse
  • 1,814
  • 1
  • 21
  • 25
  • it's funny because on my other question i was just informed that my code would be fine :/ that i could do this without creating and referencing a shell script – oldboy Jul 02 '18 at 05:32
  • surely it's possible without creating and referencing a shell script... i mean, `scrapy crawl mycrawler` is a shell command... – oldboy Jul 02 '18 at 05:34
  • You can do it without creating a shell script. Python binary needs to be either referenced i.e. /usr/bin/python script.py (where script.py is a python script), or python needs to be part of the path. The path is where your system will look for binaries consecutively to execute a command. – Jesse Jul 02 '18 at 05:36
  • i believe i need to execute the script by issuing the command `scrapy crawl mycrawler` – oldboy Jul 02 '18 at 05:38
  • Okay you need to add scrapy to your PATH variable then. I assume scrapy is binary? – Jesse Jul 02 '18 at 05:39
  • ill try that. scrapy is also dependent on python so? – oldboy Jul 02 '18 at 05:40
  • AS long as python is also in your PATH, you shouldn't have any trouble. – Jesse Jul 02 '18 at 05:40
  • still getting errors with the absolute path... :/ `Scrapy 1.5.0 - no active project Unknown command: crawl Use "scrapy" to see available commands` – oldboy Jul 02 '18 at 05:57
0

As discussed in comments the very initial error is that the entry places /usr/bin where the executable should be:

*/2 * * * * root /usr/bin scrapy crawl mycrawler
                 ^^^^^^^^
                 command

Once fixed to be scapy, the ultimate issue is that scrapy is in /usr/local/bin which is not in your PATH. To change this:

PATH=/sbin:/bin:/usr/sbin:/usr/bin:/usr/local/bin/

And then you should be able to just do:

 */2 * * * * root cd <project dir> && scrapy crawl mycrawler
Matthew Story
  • 3,573
  • 15
  • 26
  • won't that overwrite `PATH` altogether? i have other things in path that you haven't listed. wouldn't `PATH=$PATH:...` be a lot more appropriate – oldboy Jul 02 '18 at 05:41
  • PATH in your crontab is always just what it is ... cron unsets all environment variables before running. In this case I merely appended /usr/local/bin to the path definition already defined in the crontab you posted in your OP. – Matthew Story Jul 02 '18 at 05:41
  • so it's basically an "isolated" version of `PATH`? that won't effect everything else that i've stored in `PATH`? – oldboy Jul 02 '18 at 05:42
  • yeah, it's just the path used in cron ... and as I said ... you've already set it in your crontab, we're just adding `/usr/local/bin` to it. you could prepend `$PATH` to that if it makes you feel better, but it won't do anything as cron unsets env. – Matthew Story Jul 02 '18 at 05:43
  • now i'm getting the error `Scrapy 1.5.0 - no active project -- Unknown command: crawl -- Use "scrapy" to see available commands`. that shouldn't be the case. – oldboy Jul 02 '18 at 05:46
  • You likely need to `cd` into a project directory, as discussed in chat. – Matthew Story Jul 02 '18 at 05:48
  • yes, trust me, it's working fine outside of the cron job. it's just this cron job bs :/ – oldboy Jul 02 '18 at 05:51
  • use the absolute path of the crawl and mycrawler files in the command – Jesse Jul 02 '18 at 05:51
  • Yeah ... it's also worth noting that `PYTHONPATH` is not set in cron ... so if you need it set you have to do so explicitly in your crontab – Matthew Story Jul 02 '18 at 05:54
  • @jesse crawl isn't a file, but i'll try using the absolute path for mycrawler. `cd` to the path apparently made the emails stop working and the script still isnt working :/ – oldboy Jul 02 '18 at 05:54
  • did you use the `&&` between the cd and the scrapy? – Matthew Story Jul 02 '18 at 05:55
  • so basically i need to add all of the paths of any dependency? – oldboy Jul 02 '18 at 05:59
  • 1
    yeah for PATH and PYTHONPATH ... as well as any other environment information you might typically assume (e.g. stuff loaded by your bashrc) as cron not only doesn't load your rc file, it also unsets all environment variables. – Matthew Story Jul 02 '18 at 06:01