0

Shorter question:

Make targets have files as dependencies; let's say one example dependency is the file "D." I would like Make to traverse its dependency graph, and for each "D," also depend on success being recorded in a log file of "D's" recipe's exit status ("D.status.log"; for simplicity's sake, just includes process exit status or the string "Started"). Is this possible without digging into Make's source myself and modifying the graph logic? (I.e. has somebody already written this as a patch or another Make-like utility?)

Details:

I am a fan in spirit of using Makefiles to run data processing workflows. I am not alone, as searching for "makefile data" yields a few like-minded folks:

However, in practice, I find it a glorious pain in the neck. Multi-step processes generate output from programs that don't necessarily finish. Running a multi-step workflow on thousands of input files means cobbling together some find ... rm commands, which feels like a fragile data management strategy.

Basically, I'd like a well-logged Make for data that has this style of interface: I'll call it fantasymake below.

Makefile:

all: results1 results2
results1: script input1
    script input1 >results1
results2: script input2
    script input2 >results2
results2beyond: script results2
    script results2 >results2beyond

Example directory tree before:

Makefile
input1
input2

Directory after running fantasymake:

Makefile
input1
input2
results1
results1.err.log
results1.out.log
results1.status.log
results2
results2.err.log
results2.out.log
results2.status.log
results2beyond
results2beyond.err.log
results2beyond.out.log
results2beyond.status.log

Presently, I could get the logs with this bit of Bash, but I haven't found a graceful way to integrate these wrapper commands into Makefile rules:

echo Started. >results.status.log
some_program >results.out.log 2>results.err.log
echo $? >results.status.log

(Recalling every non-joined line in a Makefile definition is a separate shell: An in-Makefile wrapper would have a continuation line (backslash) between some_program ... and echo $$? to make sure they're both executed in the same shell.)

Back to the fantasymake behaviors, this would be the directory after running fantasymake clean:

Makefile
input1
input2

Suppose running fantasymake, results2 failed or was terminated. (And suppose we didn't fantasymake clean.) Then results2beyond would not get generated; and here's where I don't think I can just rely on unmodified Make: results2.status.log logs that results2 failed, so fantasymake would not proceed to results2beyond on the next invocation.

To get the build to finish, a clean-failed rule could sweep away erroneous results. You may need this if you have, say, a database dependency (or live connection) that was easier to leave out of Make. Here's what the directory would look like after running fantasymake clean-failed instead of fantasymake clean:

Makefile
input1
input2
results1
results1.err.log
results1.out.log
results1.status.log

Suppose after running fantasymake clean-failed, script is updated. Then running fantasymake would regenerate results1 and its logs alongside results2.

From glancing at Wikipedia (List of build automation software), it looks like none of makepp, omake, or cmake do the trick. The list on that page (I lack the reputation to link anymore) is a bit lengthy, so I turn to this lovely crowd that has helped lurking me many times already.

Is this an extension I'd have to hack together, or does it already exist?

2 Answers2

0

For the wrappers, this is trivial if you use GNU make. Just use a user-defined function:

TARGETS = one two three

# Invoke this with $(call LOG,<cmdline>)
define LOG
  echo "$$(date): Started." >'$@'.status.log
  ($1) >'$@'.out.log 2>'$@'.err.log
  echo "$$(date): Completed: $$?" >>'$@'.status.log
endef

all: $(TARGETS)

$(TARGETS):
    $(call LOG, echo "$@ out"; echo "$@ error" 1>&2)

I'm not really sure what exactly you're trying to accomplish with the "clean" stuff. If you just want a target clean-failed that will remove the logs for any target which doesn't exist, that's simple enough:

TARGETS = one two three

clean-failed:
        for t in $(TARGETS); do [ -f "$$t" ] || rm -f "$$t".*.log; done

The rest of your requirements sound, to me, like standard make functionality.

MadScientist
  • 92,819
  • 9
  • 109
  • 136
  • That is close, definitely good on the creating-logs bit (though, I forgot to join the shells of the command and the exit status line, so as we both wrote it the log would always end "0".). However, I wasn't clear on the purpose of '.status.log': **To detect when an intermediary job failed.** I'll update the question. – Alex Nelson May 10 '13 at 19:07
  • There, it's updated now. I'd love to hear if the hooks while traversing the dependency graph are another part of standard make functionality, but I looked for and couldn't find any indications this was so. – Alex Nelson May 10 '13 at 19:16
  • Correct, you would have to write the commands in a single statement to use a function like this. As for the clean behavior, one way to have that work is to list each target as a prerequisite of the next one, but this would lose all your parallelism. The only other way would be to do it in the shell. – MadScientist May 10 '13 at 19:41
  • Thank you, @MadScientist. It sounds like you're giving a vote for this being an implementation project? – Alex Nelson May 12 '13 at 18:35
0

I think you can achieve this with regular make, you just have to be a bit smarter about how you setup your rules. Specifically, don't put your results file in place until you are sure it is complete and consistent. Change your makefile like this:

all: results1 results2
results1: script input1
    script input1 >results1.tmp && mv results1.tmp results1
results2: script input2
    script input2 >results2.tmp && mv results2.tmp results2
results2beyond: script results2
    script results2 >results2beyond.tmp && mv results2beyond.tmp results2beyond

Now if the power dies or your disk fills up or something like that, the workflow will pickup wherever it left off. Any result files that exist are guaranteed to be complete and consistent, because the shell will not execute the mv command unless the previous command finished successfully.

UPDATE:

If you're using GNU make you can simplify the makefile somewhat:

PROCESS=script $< > $@.tmp && mv $@.tmp $@
all: results1 results2
results%: input% script
    $(PROCESS)

results2beyond: results2 script
    $(PROCESS)

Depending on how determined you are, you can probably simplify this even more, but that's left as an exercise for the reader.

Eric Melski
  • 16,432
  • 3
  • 38
  • 52
  • This is almost what I want. Someone on the help-make mailing list thinks this is the answer to my question; so, I don't think I've asked the question correctly yet. – Alex Nelson May 15 '13 at 04:51
  • Ok, I've hopefully asked better now. The issue I have with your answer is that it isn't quite generic enough to wrap in a function like @MadScientist suggested. The dependency graph would end up with a lot of boilerplate, a hint that there exists a programmatic solution to all that extra code. I [tried](https://gist.github.com/ajnelson/5581508) merging your two answers, and ended up wanting to just modify target names to something like 'realtarget.done'. I'd probably settle for that, but wouldn't be entirely satisfied. – Alex Nelson May 15 '13 at 05:12
  • @AlexNelson I updated my answer with some guidance on how you can refactor the makefile to eliminate redundancy. – Eric Melski May 15 '13 at 07:20