When trying to turn some (for me:) common bash idioms into turtle scripts, I still encounter issues. This is a long post, sorry - you may just skip the introductory explanations, and jump to the actual issues towards the end - but I hope to get my point (questions) across in a clear manner this way.
One idiom I often use in bash scripting is chaining together (piping) find, egrep, and xargs with null-terminated strings. The reason is simple: even file names with spaces and other weird characters do not cause any problems this way.
I would use something like this:
find . -name "*" -print0 ... | egrep -z -Z ... | xargs -0 ...
somtimes I would want to work line by line on the files matching with -L 1
find . -name "*" -print0 ... | egrep -z -Z ... | xargs -0 -L 1 ...
Or, instead of xargs -0... I would use another another tool, like rsync with ssh, that understands null terminated strings as well: -0
To syncronize/save the (essential) content of my current directory to some other directory eg. I would use something like:
binaries="exe$"
logfiles="log$"
pidfiles="pid$"
shakestuff="\_shake|\_build|\.\.database"
pat="^\.$|/dist|\.cabal-sandbox|cabal\.sandbox\.config|$shakestuff|\.o$|\.dyn_o$|\.hi$|\.dyn_hi$|\.hdevtools.sock$|$binaries|$logfiles|$pidfiles|TAGS"
find . -iname "*" -print0 -type f | egrep -z -Z -v "$pat" | rsync -a -e ssh --delete --progress --files-from=- -0 ./ .../path/to/some/other/dir
find prints all the files in the current directory, null terminated: -print0
egrep -v "$pat", keeps from this file list only those not matching the pattern $pat, the essential files only ie.: I don't bother synchronizing/saving files in the .cabal-sandbox directory eg., and egrep being in the middle of this chain has to consume as well as produce null terminated strings here: -z -Z The pattern pat is assembled beforehand piece by piece.
rsync with ssh is instructed here to get input from stdin: --files-from=-, again null terminated: -0 (note that while in general "rsync... from to" behaves very differently depending on wheter the directory from is given with a trailing slash, as here: ./ or not, this is less important here, as the input to rsync is coming from stdin: -)
Now I have tried to turn this is into a turtle script, with some success ie., but am facing still some issues, and would like to turn this into more idiomatic turtle:
For the sake completness, here is my currently working script in a file sync.hs, called with the help of a little runturtle bash script, I can call sync.hs
to either just show the list of files being considered: sync.hs -e
or sync them to another dir like so: sync.hs --to /path/to/other/dir
Here is the code (runturtle):
#!/bin/sh
exec cabal exec runhaskell -- "$@"
Here is the code (sync.hs):
#!/usr/bin/env runturtle
{-# LANGUAGE OverloadedStrings #-}
-- {-# LANGUAGE ExtendedDefaultRules #-}
{-# OPTIONS_GHC -fno-warn-type-defaults #-}
import Turtle
data Opts = Opts {
doEcho :: Bool
, toDir :: Turtle.FilePath
}
deriving (Show)
parser :: Parser Opts
parser = Opts <$>
(switch "echo" 'e' "echo the files considered for synchronizing")
<*> (optPath "to" 't' "sync to dir")
binaries="|\\./website$|srv$"
logfiles="|log$"
pidfiles="|pid$|pnm$"
shakestuff="|_shake|_build|\\.\\.database"
pat="^\\.$"
<>"|/dist|\\.cabal-sandbox|cabal\\.sandbox\\.config"
<> shakestuff
<>"|\\.git|\\.o$|\\.dyn_o$|\\.hi$|\\.dyn_hi$|\\.hdevtools.sock$"
<> binaries
<> logfiles
<> pidfiles
<>"|TAGS"
sync :: Opts -> IO ()
sync opts = do {
; echo "syncing..."
; when (doEcho opts)
(do {
; echo $ "pat: " <> pat
; sh $ do inproc "find" [".", "-iname", "*", "-print0", "-type", "f"] empty
& inproc "egrep" ["-z", "-Z" , "-v", pat]
& inproc "xargs" ["-0", "-L", "1"]
& grep (has ".")
>>= echo
; exit ExitSuccess
})
; do {
; let txt = "find . -iname \"*\" -print0 -type f | egrep -z -Z -v \"" <> pat <>"\" | rsync -a -e ssh --delete --progress --files-from=- -0 ./ "
<> format fp (toDir opts)
; echo txt
; shell txt empty
; return ()
}
; return ()
}
main :: IO ()
main = (do {
; opts <- options "sync file to another directory" parser
; print (opts)
; sync opts
; return ()
})
Now here are my issues with this script:
First of all: I can run this on the command line, my flycheck syntax checking in emacs relying on either ghc other hdevtools works fine, thus get the benefits of Haskell's strong typing for shell scrips now (thanks for creating turtle by the way). I can even use turtle on the command line (cabal repl)
cabal repl
> :set -XOverloadedStrings
> import Turtle
> ls "."
> view (shell "whatever cmd" empty)
etc, but if I load my sync.hs script, I cannot access its pieces (functions defined in sync)
> :l sync.hs
[1 of 1] Compiling Main ( sync.hs, interpreted )
Ok, modules loaded: Main.
I would like to see the pattern defined above eg.:
> pat
<interactive>:12:1:
Not in scope: ‘pat’
Perhaps you meant ‘cat’ (imported from Turtle)
I would like to use the functions defined in sync.hs as shortcuts for experimenting eg. like this
> view $ inproc "find" [".", "-iname", "*", "-print0", "-type", "f"] empty & inproc "egrep" ["-z", "-Z" , "-v", pat]
<interactive>:15:111:
Not in scope: ‘pat’
Perhaps you meant ‘cat’ (imported from Turtle)
Second, you may have noticed in my turtle script above that I have used "more idiomatic" turtle in the case of echo:
; sh $ do inproc "find" [".", "-iname", "*", "-print0", "-type", "f"] empty
& inproc "egrep" ["-z", "-Z" , "-v", pat]
& inproc "xargs" ["-0", "-L", "1"]
& grep (has ".")
>>= echo
ie. I am using turtle style of piping: function application, here in reverse order with &, more idiomatic at least than in the case of toDir, where I am actually relying on bash to do the job:
; let txt = "find . -iname \"*\" -print0 -type f | egrep -z -Z -v \"" <> pat <>"\" | rsync -a -e ssh --delete --progress --files-from=- -0 ./ "
<> format fp (toDir opts)
; echo txt
; shell txt empty
But even in this more idiomatic case of echo, I had to resort to some workaround: grep (has "."), If I don't use this I get to see empty strings:
turtle> view $ inproc "find" [".", "-iname", "*", "-print0", "-type", "f"] empty & inproc "egrep" ["-z", "-Z" , "-v", "\\.cabal-sandbox|/dist"]
output (lots of output ommited here, but see the single "\NUL" at the very end):
"...ntax.hs\NUL./static/lib-pi-forall/src/PiForall/Parser.hs\NUL./static/lib-pi-forall/src/PiForall/TypeCheck.hs\NUL./static/lib-pi-forall/LICENSE\NUL./shclean.do\NUL./TAGS\NUL./T10.hs\NUL./todo-yet-stop-the-program-as-in-running-if-not-told-another\NUL./talks\NUL./index.html\NUL./T1.hs.orig\NUL./sbbuild.sh\NUL./_shake\NUL./_shake/Main.hi\NUL./_shake/Main.dyn_o\NUL./_shake/build\NUL./_shake/Main.o\NUL./_shake/Main.dyn_hi\NUL./T4.hs\NUL./sync.hs\NUL./etc\NUL./.hdevtools.sock\NUL./more-stuff.hs\NUL./my.hs\NUL./T9.hs\NUL./snap-index\NUL./T6.hs\NUL./etc.html\NUL./cabalfile.hs\NUL./todo-maybe-issue-start-stop-restart-july2016\NUL./try-turtle-urwclassico.do\NUL./install.do\NUL./update-rc\NUL./index\NUL./done-pipe\NUL./clean.do\NUL./bootstrap.do\NUL./mystuff.cabal\NUL./pire\NUL./log\NUL./build.sh\NUL./goodsync.hs\NUL./cmds.hs\NUL./LICENSE\NUL./dry.do\NUL./T5.hs\NUL./snap-pire\NUL"
"\NUL"
See the empty strings I get at the end, if I don't bother to remove them with grep (has ".")
turtle> view $ inproc "find" [".", "-iname", "*", "-print0", "-type", "f"] empty & inproc "egrep" ["-z", "-Z" , "-v", "\\.cabal-sandbox|/dist"] & inproc "xargs" ["-0", "-L", "1"]
(again lots of output omitted)
"./done-pipe"
"./clean.do"
"./bootstrap.do"
"./mystuff.cabal"
"./pire"
"./log"
"./build.sh"
"./goodsync.hs"
"./cmds.hs"
"./LICENSE"
"./dry.do"
"./T5.hs"
"./snap-pire"
""
""
""
""
turtle>
Why is this? In bash I don't have to do this! Any better / recommended way to use null terminated strings in turtle?
And last by not least, I couldn't come up with an idiomatic turtle solution for the other, rsync piece of code. Here is an attempt, but see what happens: some files are transfered, but rsync complains about my current dir /home/rx/work/servant/ not being found null-terminated: link_stat "/home/rx/work/servant/#012" failed: (well yes: its name is simply "/home/rx/work/servant/" not "/home/rx/work/servant/#012")
; view $ inproc "find" [".", "-iname", "*", "-print0", "-type", "f"] empty
& inproc "egrep" ["-z", "-Z", "-v", pat]
& grep (has ".")
& shell ("rsync -a -e ssh --delete --progress --files-from=- -0 ./ " <> (format fp $ toDir opts))
rx@softland ~/work/servant $ ./sync.hs --to ~/tmp/website_
Opts {doEcho = False, toDir = FilePath "/home/rx/tmp/website_"}
syncing...
building file list ...
rsync: link_stat "/home/rx/work/servant/\#012" failed: No such file or directory (2)
135 files to consider
./
q
8,715 100% 0.00kB/s 0:00:00 (xfr#1, to-chk=95/135)
sync.hs
2,034 100% 1.94MB/s 0:00:00 (xfr#2, to-chk=86/135)
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1183) [sender=3.1.1]
ExitFailure 23
rx@softland ~/work/servant $
But really I would like to use even inproc for the rsync piece (with or without grep (has "."))
; view $ inproc "find" [".", "-iname", "*", "-print0", "-type", "f"] empty
& inproc "egrep" ["-z", "-Z", "-v", pat]
& grep (has ".")
& inproc "rsync" ["-a", "-e", "ssh", "--delete", "--progress", "--files-from=-", "-0", "./", format fp $ toDir opts]
rx@softland ~/work/servant $ ./sync.hs --to ~/tmp/website_
Opts {doEcho = False, toDir = FilePath "/home/rx/tmp/website_"}
syncing...
"building file list ... "
rsync: link_stat "/home/rx/work/servant/\#012" failed: No such file or directory (2)
" 0 files...\r 100 files...\r137 files to consider"
"./"
"sync.hs"
"\r 2,053 100% 0.00kB/s 0:00:00 \r 2,053 100% 0.00kB/s 0:00:00 (xfr#1, to-chk=86/137)"
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1183) [sender=3.1.1]
rx@softland ~/work/servant $
Thanks in advance.