12

I'm looking for an implementation of a 'cacheme' command, which 'memoizes' the output (stdout) of whatever has in ARGV. If it never ran it, it will run it and somewhat memorize the output. If it ran it, it will just copy the output of the file (or even better, both output and error to &1 and &2 respectively).

Let's suppose someone wrote this command, it would work like this.

$ time cacheme sleep 1    # first time it takes one sec
real   0m1.228s
user   0m0.140s
sys    0m0.040s

$ time cacheme sleep 1    # second time it looks for stdout in the cache (dflt expires in 1h)
#DEBUG# Cache version found! (1 minute old)

real   0m0.100s
user   0m0.100s
sys    0m0.040s

This example is a bit silly because it has no output. Ideally it would be tested on a script like sleep-1-and-echo-hello-world.sh.

I created a small script that creates a file in /tmp/ with hash of full command name and username, but I'm pretty sure something already exists.

Are you aware of any of this?

Note. Why I would do this? Occasionally I run commands that are network or compute intensive, they take minutes to run and the output doesn't change much. If I know it in advance I'd just prepend a cacheme <cmd>, go for dinner and when i'm back I can just rerun the SAME command over and over on the same machine and get the same answer (meaning, same stdout) in an instance.

Riccardo
  • 1,104
  • 13
  • 22
  • What's the use case ? Why would you want this ? – Brian Agnew Aug 10 '12 at 10:57
  • 1
    For instance, a script which takes ages to execute but normally gives consistent output, in cases in which you're not interested in having a fresh answer but just a good not-too-old version. – Riccardo Aug 10 '12 at 11:03
  • 1
    This is very close to what `make` does – Jo So Sep 18 '12 at 18:11
  • This question really needs to specify a shell and what 'output' means. In memoization, the inputs can be cmdline args, stdin, anything in printenv, or any other open file descriptor, and there are a large number of factors in 'output' as well. stdout, stderr, exit status, and other fd's? Don't want to memoize and leave out some important factors that people might be depending on. – Brian Chrisman Feb 16 '16 at 22:58
  • Brian > Edited. PTAL if I've addressed your 2 concerns – Riccardo Dec 23 '22 at 14:19

9 Answers9

8

Improved solution above somewhat by also adding expiry age as optional argument.

#!/bin/sh
# save as e.g. $HOME/.local/bin/cacheme
# and then chmod u+x $HOME/.local/bin/cacheme
VERBOSE=false
PROG="$(basename $0)"
DIR="${HOME}/.cache/${PROG}"
mkdir -p "${DIR}"
EXPIRY=600 # default to 10 minutes
# check if first argument is a number, if so use it as expiration (seconds)
[ "$1" -eq "$1" ] 2>/dev/null && EXPIRY=$1 && shift
[ "$VERBOSE" = true ] && echo "Using expiration $EXPIRY seconds"
CMD="$@"
HASH=$(echo "$CMD" | md5sum | awk '{print $1}')
CACHE="$DIR/$HASH"
test -f "${CACHE}" && [ $(expr $(date +%s) - $(date -r "$CACHE" +%s)) -le $EXPIRY ] || eval "$CMD" > "${CACHE}"
cat "${CACHE}"
error
  • 322
  • 2
  • 11
  • 1. Maybe a usage would be nice: "yourscript.sh sleep 1" doesnt work. 2. I don't understand line 6: what is [ "$1" -eq "$1" ] for ? – Riccardo Mar 23 '16 at 10:07
  • Added some comments for documentation. the "$1" -eq "$1" stuff is magic to detect if the first argument if an integer - if so - use it to set the expiration time in seconds. Also added eval to make it work with arguments as in "yourscript.sh sleep 1". – error Apr 01 '16 at 08:36
  • 2
    This thing is great, has anyone made a proper util of it? Desired features, docs, garbage collection (currently the expired cache items are only deleted when overwritten), various flags, (use only cache, caching stderr, caching exit codes), and some POSIX script optimizations. – agc Jun 05 '16 at 14:49
  • 1
    Nice answer! A few suggestions for improvement: (1) Pipe the output into the "tee" command which allows it to be viewed real-time as well as stored in the cache. (2) Preserve colors (for example in commands like "ls --color") by using "script --flush --quiet /dev/null --command $CMD". (3) Avoid calling "exec" by using script as well. (4) Use "find" command rather than doing math on the dates (more reliable in corner cases). I'm not sure the Stack Overflow etiquette, but I'll post an updated solution implementing the above suggestions. – presto8 Feb 02 '19 at 16:14
5

I've implemented a simple caching script for bash, because I wanted to speed up plotting from piped shell command in gnuplot. It can be used to cache output of any command. Cache is used as long as the arguments are the same and files passed in arguments haven't changed. System is responsible for cleaning up.

#!/bin/bash

# hash all arguments
KEY="$@"

# hash last modified dates of any files
for arg in "$@"
do
  if [ -f $arg ]
  then
    KEY+=`date -r "$arg" +\ %s`
  fi
done

# use the hash as a name for temporary file
FILE="/tmp/command_cache.`echo -n "$KEY" | md5sum | cut -c -10`"

# use cached file or execute the command and cache it
if [ -f $FILE ]
then
  cat $FILE
else
  $@ | tee $FILE
fi

You can name the script cache, set executable flag and put it in your PATH. Then simply prefix any command with cache to use it.

Community
  • 1
  • 1
arekolek
  • 9,128
  • 3
  • 58
  • 79
4

Author of bash-cache here with an update. I recently published bkt, a CLI and Rust library for subprocess caching. Here's a simple example:

# Execute and cache an invocation of 'date +%s.%N'
$ bkt -- date +%s.%N
1631992417.080884000

# A subsequent invocation reuses the same cached output
$ bkt -- date +%s.%N
1631992417.080884000

It supports a number of features such as asynchronous refreshing (--stale and --warm), namespaced caches (--scope), and optionally keying off the working directory (--cwd) and select environment variables (--env). See the README for more.

It's still a work in progress but it's functional and effective! I'm using it already to speed up my shell prompt and a number of other common tasks.

dimo414
  • 47,227
  • 18
  • 148
  • 244
2

I created bash-cache, a memoization library for Bash, which works exactly how you're describing. It's designed specifically to cache Bash functions, but obviously you can wrap calls to other commands in functions.

It handles a number of edge-case behaviors that many simpler caching mechanisms miss. It reports the exit code of the original call, keeps stdout and stderr separately, and retains any trailing whitespace in the output ($() command substitutions will truncate trailing whitespace).

Demo:

# Define function normally, then decorate it with bc::cache
$ maybe_sleep() {
  sleep "$@"
  echo "Did I sleep?"
} && bc::cache maybe_sleep

# Initial call invokes the function
$ time maybe_sleep 1
Did I sleep?

real    0m1.047s
user    0m0.000s
sys     0m0.020s

# Subsequent call uses the cache
$ time maybe_sleep 1
Did I sleep?

real    0m0.044s
user    0m0.000s
sys     0m0.010s

# Invocations with different arguments are cached separately
$ time maybe_sleep 2
Did I sleep?

real    0m2.049s
user    0m0.000s
sys     0m0.020s

There's also a benchmark function that shows the overhead of the caching:

$ bc::benchmark maybe_sleep 1
Original:       1.007
Cold Cache:     1.052
Warm Cache:     0.044

So you can see the read/write overhead (on my machine, which uses tmpfs) is roughly 1/20th of a second. This benchmark utility can help you decide whether it's worth caching a particular call or not.

dimo414
  • 47,227
  • 18
  • 148
  • 244
1

How about this simple shell script (not tested)?

#!/bin/sh

mkdir -p cache

cachefile=cache/cache

for i in "$@"
do
    cachefile=${cachefile}_$(printf %s "$i" | sed 's/./\\&/g')
done

test -f "$cachefile" || "$@" > "$cachefile"
cat "$cachefile"
Jo So
  • 25,005
  • 6
  • 42
  • 59
  • it doesnt work :( cache: line 13: unexpected EOF while looking for matching `"' cache: line 14: syntax error: unexpected end of file – Riccardo Sep 18 '12 at 17:57
  • 1
    Corrected the obvious mistake (closing `"' in line 12) . Should work now. – Jo So Sep 18 '12 at 18:09
1

Improved upon solution from error:

  • Pipes output into the "tee" command which allows it to be viewed real-time as well as stored in the cache.
  • Preserve colors (for example in commands like "ls --color") by using "script --flush --quiet /dev/null --command $CMD".
  • Avoid calling "exec" by using script as well
  • Use bash and [[
    #!/usr/bin/env bash

    CMD="$@"
    [[ -z $CMD ]] && echo "usage: EXPIRY=600 cache cmd arg1 ... argN" && exit 1

    # set -e -x

    VERBOSE=false
    PROG="$(basename $0)"

    EXPIRY=${EXPIRY:-600}  # default to 10 minutes, can be overriden
    EXPIRE_DATE=$(date -Is -d "-$EXPIRY seconds")

    [[ $VERBOSE = true ]] && echo "Using expiration $EXPIRY seconds"

    HASH=$(echo "$CMD" | md5sum | awk '{print $1}')
    CACHEDIR="${HOME}/.cache/${PROG}"
    mkdir -p "${CACHEDIR}"
    CACHEFILE="$CACHEDIR/$HASH"

    if [[ -e $CACHEFILE ]] && [[ $(date -Is -r "$CACHEFILE") > $EXPIRE_DATE ]]; then
        cat "$CACHEFILE"
    else
        script --flush --quiet --return /dev/null --command "$CMD" | tee "$CACHEFILE"
    fi
presto8
  • 487
  • 5
  • 8
0

The solution I came up in ruby is this. Does anybody see any optimization?

#!/usr/bin/env ruby

VER = '1.2'
$time_cache_secs = 3600
$cache_dir = File.expand_path("~/.cacheme")

require 'rubygems'
begin
  require 'filecache'           # gem install ruby-cache
rescue Exception => e
  puts 'gem filecache requires installation, sorry. trying to install myself'
  system  'sudo gem install -r filecache'
  puts  'Try re-running the program now.'
  exit 1
end

=begin
  # create a new cache called "my-cache", rooted in /home/simon/caches
  # with an expiry time of 30 seconds, and a file hierarchy three
  # directories deep
=end
def main
  cache = FileCache.new("cache3", $cache_dir, $time_cache_secs, 3)
  cmd = ARGV.join(' ').to_s   # caching on full command, note that quotes are stripped
  cmd = 'echo give me an argment' if cmd.length < 1

  # caches the command and retrieves it
  if cache.get('output' + cmd)
    #deb "Cache found!(for '#{cmd}')"
  else
    #deb "Cache not found! Recalculating and setting for the future"
    cache.set('output' + cmd, `#{cmd}`)
  end
  #deb 'anyway calling the cache now'
  print(cache.get('output' + cmd))
end

main
Riccardo
  • 1,104
  • 13
  • 22
0

An implementation exists here: https://bitbucket.org/sivann/runcached/src Caches executable path, output, exit code, remembers arguments. Configurable expiration. Implemented in bash, C, python, choose whatever suits you.

sivann
  • 2,083
  • 4
  • 29
  • 44
0

My solution is to use the find command

#!/bin/sh

: ${CACHE=cache}
: ${EXPIRY=10} # minutes

long_def() {
    sleep 10
    echo finished
}

find "$CACHE" -type f -mmin +"$EXPIRY" -delete 2>/dev/null
[ -f "$CACHE" ] || long_def >"$CACHE"

cat "$CACHE"
Ivan
  • 1